# Learning Rust #5 — Shipping a Real CLI (Args, Files, HTTP, Concurrency)
Table of Contents
Time to ship something concrete. We’ll build a small CLI that:
- reads a CSV of URLs,
- fetches them concurrently with a configurable limit,
- collects status code + content length,
- and writes a pretty JSON report.
We’ll use clap for args, anyhow for errors, tokio + reqwest for async HTTP, and tracing for logs.
Project setup
Cargo.toml
[package]name = "url-audit"version = "0.1.0"edition = "2021"
[dependencies]clap = { version = "4", features = ["derive"] }anyhow = "1"serde = { version = "1", features = ["derive"] }serde_json = "1"reqwest = { version = "0.12", features = ["json", "gzip", "brotli"] }tokio = { version = "1", features = ["rt-multi-thread", "macros"] }tracing = "0.1"tracing-subscriber = { version = "0.3", features = ["fmt", "env-filter"] }csv = "1"If you’re on Windows behind a proxy or on a slow link, start with a lower concurrency (e.g.
-c 8).
The CLI
src/main.rs
use anyhow::{Context, Result};use clap::Parser;use serde::{Deserialize, Serialize};use tokio::time::{timeout, Duration};use tracing::{info, warn, Level};use tracing_subscriber::EnvFilter;
#[derive(Parser, Debug)]#[command(version, about = "Audit a list of URLs from a CSV and output JSON")]struct Args { /// CSV path with a header 'url' #[arg(short, long)] input: String,
/// Output JSON path #[arg(short, long, default_value = "report.json")] output: String,
/// Max number of concurrent requests #[arg(short = 'c', long, default_value_t = 32)] concurrency: usize,
/// Per-request timeout in seconds #[arg(short = 't', long, default_value_t = 10u64)] timeout: u64,
/// Optional custom User-Agent header #[arg(long, default_value = "url-audit/0.1")] user_agent: String,}
#[derive(Debug, Deserialize)]struct InRow { url: String,}
#[derive(Debug, Serialize)]struct OutRow { url: String, status: Option<u16>, len: Option<u64>, error: Option<String>,}
#[tokio::main]async fn main() -> Result<()> { tracing_subscriber::fmt() .with_env_filter(EnvFilter::from_default_env().add_directive(Level::INFO.into())) .init();
let args = Args::parse(); info!("reading input = {}", &args.input);
let client = reqwest::Client::builder() .user_agent(args.user_agent.clone()) .tcp_nodelay(true) .build() .context("building HTTP client")?;
// Read CSV eagerly; fine for small/medium lists. For huge lists, stream lines. let mut rdr = csv::Reader::from_path(&args.input) .with_context(|| format!("opening CSV: {}", &args.input))?;
let mut urls: Vec<String> = Vec::new(); for rec in rdr.deserialize::<InRow>() { let row = rec.with_context(|| "parsing CSV row")?; if !row.url.trim().is_empty() { urls.push(row.url); } } info!(count = urls.len(), "loaded URLs");
// Concurrency gate let sem = std::sync::Arc::new(tokio::sync::Semaphore::new(args.concurrency)); let mut tasks = Vec::with_capacity(urls.len());
for url in urls { let client = client.clone(); let permit = sem.clone().acquire_owned().await?; // Owned permit drops with task let tmo = Duration::from_secs(args.timeout); tasks.push(tokio::spawn(async move { let _permit = permit; // keep until the end of this task fetch_row(&client, url, tmo).await })); }
let mut out = Vec::with_capacity(tasks.len()); for t in tasks { match t.await { Ok(row) => out.push(row), Err(e) => out.push(OutRow { url: "<join-error>".into(), status: None, len: None, error: Some(format!("join error: {e}")), }), } }
std::fs::write(&args.output, serde_json::to_vec_pretty(&out)?) .with_context(|| format!("writing {}", &args.output))?;
info!("wrote {} rows to {}", out.len(), &args.output); Ok(())}
async fn fetch_row(client: &reqwest::Client, url: String, tmo: Duration) -> OutRow { let fut = async { match client.get(&url).send().await { Ok(resp) => { let status = resp.status().as_u16(); let len = resp .headers() .get(reqwest::header::CONTENT_LENGTH) .and_then(|v| v.to_str().ok()) .and_then(|s| s.parse::<u64>().ok()); OutRow { url, status: Some(status), len, error: None } } Err(e) => OutRow { url, status: None, len: None, error: Some(e.to_string()) }, } };
match timeout(tmo, fut).await { Ok(row) => row, Err(_) => OutRow { url, status: None, len: None, error: Some("timeout".into()) }, }}Key details:
- We gate concurrency with a
Semaphore. Each task holds a permit until it completes. - We wrap each request in a per‑request timeout so slow hosts don’t stall the batch.
- We don’t read bodies; we only inspect headers for
Content-Length. (Servers may omit the header; in that caselenisnull.)
Sample input and run
urls.csv
urlhttps://www.rust-lang.orghttps://example.comhttps://httpbin.org/status/404Run it:
RUST_LOG=info cargo run --release -- \ -i urls.csv -o report.json -c 32 -t 10 --user-agent "url-audit/0.1"report.json (snippet):
[ { "url": "https://www.rust-lang.org", "status": 200, "len": 12345, "error": null }, { "url": "https://example.com", "status": 200, "len": 648, "error": null }, { "url": "https://httpbin.org/status/404", "status": 404, "len": null, "error": null }]Optional: stream results as they arrive
If the CSV is huge, you can stream results to disk instead of accumulating in memory. Replace the task join loop with a futures::stream::FuturesUnordered and write each row as soon as it resolves. For simplicity, this first version buffers in memory.
Tests (tiny but useful)
Add a small helper in src/lib.rs just to demonstrate unit tests:
pub fn parse_len(s: &str) -> Option<u64> { s.parse().ok()}
#[cfg(test)]mod tests { use super::*;
#[test] fn parses_len() { assert_eq!(parse_len("123"), Some(123)); assert_eq!(parse_len("x"), None); }}Run:
cargo testPackaging and release builds
Build a release binary:
cargo build --release- Linux/macOS:
target/release/url-audit - Windows:
target\\release\\url-audit.exe
If you need a static Linux build, add the target and rebuild:
rustup target add x86_64-unknown-linux-gnucargo build --release --target x86_64-unknown-linux-gnu(For fully static MUSL builds and cross‑compilation, explore x86_64-unknown-linux-musl and tools like cross.)
Troubleshooting
- Many timeouts: increase
-t, lower-c, or verify network/DNS. - Proxy: configure environment variables (
HTTP_PROXY,HTTPS_PROXY). - Memory spikes: stream CSV and results incrementally instead of buffering.
Exercises (15–30 minutes)
- HEAD first: try a HEAD request and fall back to GET if the server returns
405 Method Not Allowed. - Retry policy: add
--retries Nwith exponential backoff for transient errors. - CSV enrichment: add input columns (
label,category) and include them in the output JSON. - Metrics: print a summary table with counts per status class (2xx/3xx/4xx/5xx) and average content length.
- Stream writer: write one JSON object per line (NDJSON) as tasks complete, so memory usage stays flat.
What I learned shipping this
- Concurrency wants a gate: a semaphore makes back‑pressure explicit and easy to reason about.
- Timeouts are non‑negotiable for robust network tools.
clap + anyhow + tracingyields CLIs that are friendly to users and maintainers.
That’s it for this mini‑series! Next, I’ll likely explore either FFI + unsafe (just enough to be safe) or a small Web API service with actix or axum to apply the same error handling/testing patterns to HTTP servers.