# Learning Rust #5 — Shipping a Real CLI (Args, Files, HTTP, Concurrency)

Table of Contents

Time to ship something concrete. We’ll build a small CLI that:

  • reads a CSV of URLs,
  • fetches them concurrently with a configurable limit,
  • collects status code + content length,
  • and writes a pretty JSON report.

We’ll use clap for args, anyhow for errors, tokio + reqwest for async HTTP, and tracing for logs.

Github


Project setup

Cargo.toml

[package]
name = "url-audit"
version = "0.1.0"
edition = "2021"
[dependencies]
clap = { version = "4", features = ["derive"] }
anyhow = "1"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
reqwest = { version = "0.12", features = ["json", "gzip", "brotli"] }
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["fmt", "env-filter"] }
csv = "1"

If you’re on Windows behind a proxy or on a slow link, start with a lower concurrency (e.g. -c 8).


The CLI

src/main.rs

use anyhow::{Context, Result};
use clap::Parser;
use serde::{Deserialize, Serialize};
use tokio::time::{timeout, Duration};
use tracing::{info, warn, Level};
use tracing_subscriber::EnvFilter;
#[derive(Parser, Debug)]
#[command(version, about = "Audit a list of URLs from a CSV and output JSON")]
struct Args {
/// CSV path with a header 'url'
#[arg(short, long)]
input: String,
/// Output JSON path
#[arg(short, long, default_value = "report.json")]
output: String,
/// Max number of concurrent requests
#[arg(short = 'c', long, default_value_t = 32)]
concurrency: usize,
/// Per-request timeout in seconds
#[arg(short = 't', long, default_value_t = 10u64)]
timeout: u64,
/// Optional custom User-Agent header
#[arg(long, default_value = "url-audit/0.1")]
user_agent: String,
}
#[derive(Debug, Deserialize)]
struct InRow {
url: String,
}
#[derive(Debug, Serialize)]
struct OutRow {
url: String,
status: Option<u16>,
len: Option<u64>,
error: Option<String>,
}
#[tokio::main]
async fn main() -> Result<()> {
tracing_subscriber::fmt()
.with_env_filter(EnvFilter::from_default_env().add_directive(Level::INFO.into()))
.init();
let args = Args::parse();
info!("reading input = {}", &args.input);
let client = reqwest::Client::builder()
.user_agent(args.user_agent.clone())
.tcp_nodelay(true)
.build()
.context("building HTTP client")?;
// Read CSV eagerly; fine for small/medium lists. For huge lists, stream lines.
let mut rdr = csv::Reader::from_path(&args.input)
.with_context(|| format!("opening CSV: {}", &args.input))?;
let mut urls: Vec<String> = Vec::new();
for rec in rdr.deserialize::<InRow>() {
let row = rec.with_context(|| "parsing CSV row")?;
if !row.url.trim().is_empty() {
urls.push(row.url);
}
}
info!(count = urls.len(), "loaded URLs");
// Concurrency gate
let sem = std::sync::Arc::new(tokio::sync::Semaphore::new(args.concurrency));
let mut tasks = Vec::with_capacity(urls.len());
for url in urls {
let client = client.clone();
let permit = sem.clone().acquire_owned().await?; // Owned permit drops with task
let tmo = Duration::from_secs(args.timeout);
tasks.push(tokio::spawn(async move {
let _permit = permit; // keep until the end of this task
fetch_row(&client, url, tmo).await
}));
}
let mut out = Vec::with_capacity(tasks.len());
for t in tasks {
match t.await {
Ok(row) => out.push(row),
Err(e) => out.push(OutRow {
url: "<join-error>".into(),
status: None,
len: None,
error: Some(format!("join error: {e}")),
}),
}
}
std::fs::write(&args.output, serde_json::to_vec_pretty(&out)?)
.with_context(|| format!("writing {}", &args.output))?;
info!("wrote {} rows to {}", out.len(), &args.output);
Ok(())
}
async fn fetch_row(client: &reqwest::Client, url: String, tmo: Duration) -> OutRow {
let fut = async {
match client.get(&url).send().await {
Ok(resp) => {
let status = resp.status().as_u16();
let len = resp
.headers()
.get(reqwest::header::CONTENT_LENGTH)
.and_then(|v| v.to_str().ok())
.and_then(|s| s.parse::<u64>().ok());
OutRow { url, status: Some(status), len, error: None }
}
Err(e) => OutRow { url, status: None, len: None, error: Some(e.to_string()) },
}
};
match timeout(tmo, fut).await {
Ok(row) => row,
Err(_) => OutRow { url, status: None, len: None, error: Some("timeout".into()) },
}
}

Key details:

  • We gate concurrency with a Semaphore. Each task holds a permit until it completes.
  • We wrap each request in a per‑request timeout so slow hosts don’t stall the batch.
  • We don’t read bodies; we only inspect headers for Content-Length. (Servers may omit the header; in that case len is null.)

Sample input and run

urls.csv

url
https://www.rust-lang.org
https://example.com
https://httpbin.org/status/404

Run it:

Terminal window
RUST_LOG=info cargo run --release -- \
-i urls.csv -o report.json -c 32 -t 10 --user-agent "url-audit/0.1"

report.json (snippet):

[
{ "url": "https://www.rust-lang.org", "status": 200, "len": 12345, "error": null },
{ "url": "https://example.com", "status": 200, "len": 648, "error": null },
{ "url": "https://httpbin.org/status/404", "status": 404, "len": null, "error": null }
]

Optional: stream results as they arrive

If the CSV is huge, you can stream results to disk instead of accumulating in memory. Replace the task join loop with a futures::stream::FuturesUnordered and write each row as soon as it resolves. For simplicity, this first version buffers in memory.


Tests (tiny but useful)

Add a small helper in src/lib.rs just to demonstrate unit tests:

pub fn parse_len(s: &str) -> Option<u64> {
s.parse().ok()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn parses_len() {
assert_eq!(parse_len("123"), Some(123));
assert_eq!(parse_len("x"), None);
}
}

Run:

Terminal window
cargo test

Packaging and release builds

Build a release binary:

Terminal window
cargo build --release
  • Linux/macOS: target/release/url-audit
  • Windows: target\\release\\url-audit.exe

If you need a static Linux build, add the target and rebuild:

Terminal window
rustup target add x86_64-unknown-linux-gnu
cargo build --release --target x86_64-unknown-linux-gnu

(For fully static MUSL builds and cross‑compilation, explore x86_64-unknown-linux-musl and tools like cross.)


Troubleshooting

  • Many timeouts: increase -t, lower -c, or verify network/DNS.
  • Proxy: configure environment variables (HTTP_PROXY, HTTPS_PROXY).
  • Memory spikes: stream CSV and results incrementally instead of buffering.

Exercises (15–30 minutes)

  1. HEAD first: try a HEAD request and fall back to GET if the server returns 405 Method Not Allowed.
  2. Retry policy: add --retries N with exponential backoff for transient errors.
  3. CSV enrichment: add input columns (label, category) and include them in the output JSON.
  4. Metrics: print a summary table with counts per status class (2xx/3xx/4xx/5xx) and average content length.
  5. Stream writer: write one JSON object per line (NDJSON) as tasks complete, so memory usage stays flat.

What I learned shipping this

  • Concurrency wants a gate: a semaphore makes back‑pressure explicit and easy to reason about.
  • Timeouts are non‑negotiable for robust network tools.
  • clap + anyhow + tracing yields CLIs that are friendly to users and maintainers.

That’s it for this mini‑series! Next, I’ll likely explore either FFI + unsafe (just enough to be safe) or a small Web API service with actix or axum to apply the same error handling/testing patterns to HTTP servers.

My avatar

Thanks for reading my blog post! Feel free to check out my other posts or contact me via the social links in the footer.


Learning Rust Series

Comments