Tools: A Practical Guide to Time for Developers: Part 1 — What time is in software (physics + agreements)

Tools: A Practical Guide to Time for Developers: Part 1 — What time is in software (physics + agreements)

Source: Dev.to

Preface to the series ## Four different problems people call “time” ## 1) Time-of-day (civil time) ## 2) Duration / intervals ## 3) Ordering / causality ## 4) Frequency / rate ## The category mistake that creates most time bugs ## Basic vocabulary that prevents endless confusion ## Resolution ## Precision ## Accuracy ## Stability ## Offset ## Jitter ## Wander ## Synchronization vs syntonization (phase vs rate) ## Timescales: what your timestamps are actually referencing ## What is a timescale? ## TAI (International Atomic Time) ## UTC (Coordinated Universal Time) ## GPS time (and other system times) ## You don’t need to memorize offsets — you need the ## contract ## Leap seconds: the edge case that isn’t optional ## The trap: “UTC” is not one behavior ## Common behaviors you will encounter ## Why this matters even if leap seconds are rare ## What you must decide early (policy, not implementation) ## Epochs and units: a timestamp is a coordinate system ## Epoch: what is your “zero”? ## Units: the silent multiplier that breaks everything ## Integers vs floats: why “it fits in a double” is not a plan ## You must label the kind of timestamp, not just the number ## The classic “1970” bug and what it really means ## Practical rules (expanded) ## Why this matters for the rest of the series ## Civil time: time zones, DST, and calendars are not “formatting” ## Civil time is a rules database, not a law of physics ## DST creates two kinds of broken times: ambiguous and missing ## Calendars are hostile: “day” and “month” are not durations ## The hidden production bugs this causes ## A policy that prevents endless pain (and where it breaks) ## Physical time vs logical time (distributed systems reality) ## Two broad approaches to ordering events ## Physical timestamps (“wall clock time”) ## Logical time (causality-aware ordering) ## Mature systems use both (and are explicit about it) ## The takeaway ## Summary: what “time” is, in one sentence ## Developer rules (keep this as your reference card) I was tasked with synchronizing time across N computers with ~1 nanosecond accuracy. Not “a laptop over Wi-Fi” — a controlled wired setup where hardware timestamping and disciplined clocks make that goal at least a meaningful engineering target. At first it sounded trivial. We learn clocks, dates, and time zones as kids. How hard can it be? The industry already has a standard solution: Precision Time Protocol (PTP). But I wanted to look inside the protocol and understand what it actually does. I expected it to be the easiest part of the whole story. Instead I ran straight into a wall of concepts: TAI vs UTC, epochs, leap seconds, RTC vs system clock, wall clock vs monotonic time, time zones, naïve timestamps. It turns out “time” is not a single thing — it’s physics, standards, and human conventions layered on top of each other. I searched for a single article that explains the whole chain — something like “Time for software developers: zero to hero” or “From RTC to PTP” — and couldn’t find it. So I decided to write the guide I wished existed: a practical manual for developers that covers the essential concepts, the typical failure modes, and the protocols and algorithms we use to keep and distribute time. This series has four parts: Let’s start with the foundation: what is time — physics or agreements? You already know what time feels like. That’s the trap. In software, “time” is not one thing. It’s a mix of physical reality (oscillators drift, signals take time to travel), standards (UTC, leap seconds), and human conventions (time zones, calendars). If you don’t separate these layers, you end up building systems that look correct in tests and then collapse in production—usually around midnight, DST, or a “rare” edge case. This part builds a simple foundation: what exactly is the thing we’re tracking when we say “time”? Most confusion comes from mixing these up. People say “time,” but they might mean four totally different things, and each one requires a different kind of clock, API, and mental model. Question: “What date/time is it right now?” This is the time humans care about: calendars, weekdays, business hours, “yesterday,” tax reports, contracts. Used for: logs, UI, audit trails, business processes, legal records. Typical failure modes: Rule of thumb: civil time is great for human meaning, not for measuring anything. Question: “How long did it take?” / “Wait 500 ms.” This is not “date/time.” This is elapsed time. You don’t want it to jump. You want it to be steady and monotonic. Used for: timeouts, retries, benchmarks, scheduling, rate limiting. Typical failure modes: Rule of thumb: durations must come from a clock that only moves forward. Question: “Which event happened first?” (especially across threads/processes/machines) This is the one that causes the most hidden damage. Humans intuitively think timestamps imply ordering. In distributed systems, that’s often false. Used for: distributed tracing, message processing, state machines, replication, conflict resolution. Typical failure modes: Rule of thumb: if correctness depends on ordering, don’t quietly rely on wall-clock time alone. Use explicit ordering mechanisms (sequence numbers, causality-aware designs, etc.) and treat timestamps as metadata. Question: “Are two clocks running at the same speed?” This isn’t “what time is it,” it’s how fast time passes. For high-precision work (PTP, measurement, telecom), this matters as much as absolute offset. Used for: high-precision sync, control loops, telecom, measurement systems, sampling, sensor fusion. Typical failure modes: Rule of thumb: precision time is always a control problem: you manage both offset (sync) and rate (syntonization). A lot of bugs are simply using a tool from one category to solve another. Classic example: using civil time to measure durations: It works… until the system clock is adjusted (NTP/PTP correction, manual change, VM migration, DST misconfig). Then the wall clock can jump backwards, and your “duration” becomes negative, your retry logic breaks, or your timeout never fires. That’s not a rare corner case. It’s an inevitable result of mixing categories. If you remember one thing from this section, remember this: Time-of-day is for meaning. Duration is for measurement. Ordering is for correctness. Frequency is for precision. When someone says “we need 1 ns accuracy,” the only correct first reaction is: accuracy of what, relative to what, over what time window, and how will we measure it? If you don’t pin down the vocabulary, teams end up arguing for weeks while everyone is technically correct in their own private definition. Below are the terms you must keep separate. What it is: the smallest step your clock can represent or report. Example: a timestamp API that returns nanoseconds has 1 ns resolution. What it is not: a guarantee that the clock is correct to 1 ns. A clock can happily produce nanosecond-looking numbers while being microseconds (or milliseconds) away from the truth. This is why “we have nanosecond timestamps” is almost meaningless by itself. What it is: how fine your measurement or reporting is — how many digits you output and how repeatable your measurement process is. Precision often gets confused with resolution. A useful way to think about it: You can have high precision measurements of a clock that is not accurate. You can also have a very accurate system that still reports in coarse units. What it is: how close your clock is to a reference (a trusted source, or “true time” in some defined sense). Accuracy always depends on: So “1 ns accuracy” without specifying the reference and measurement method is not a requirement — it’s a slogan. What it is: how consistently the clock runs over time. In other words: how noisy it is and how much its rate changes. Two systems can have the same accuracy at a single moment and wildly different stability: In practice, stability is what determines how hard your synchronization algorithm has to work. What it is: the difference between your clock and the reference right now. If the reference says 12:00:00.000000000 and you say 12:00:00.000000500, your offset is +500 ns. Offset is the number people usually mean when they casually say “we’re synced to X.” But offset alone is not the full story, because it doesn’t tell you how noisy that offset is or how it behaves over time. What it is: short-term variation — the “shake” around the average. If you measure offset once per second and the values bounce around like: +20 ns, -15 ns, +35 ns, -10 ns... that bounce is jitter. Jitter matters because many systems care about instantaneous behavior, not just long-term average. A “perfect” average offset is useless if the clock is too noisy for your application. What it is: slow changes over longer timescales — the “drift of the drift.” Where jitter is rapid noise, wander is a slow trend: temperature changes, oscillator aging, environmental effects, network path changes that persist. Wander is what makes a system look great in a short demo and gradually fall apart over hours or days if the control loop can’t track it. This is one of the most important distinctions in precision time, and it’s usually not named explicitly — which is why people get confused. Synchronize = align phase → reduce offset “Make our timestamps match right now.” Syntonize = align rate → reduce drift “Make our clocks run at the same speed.” If you only synchronize (phase) but don’t syntonize (rate), you get a system that constantly drifts away and needs repeated “kicks” back into place. If you syntonize well, the system stays close with small, smooth corrections. That’s why protocols like PTP are not “set the time once and forget it.” They run a continuous control loop: measure offset, estimate delay, correct phase and rate, and fight noise (jitter) and slow effects (wander). If you want a single mental model: precision time is control theory applied to clocks. The numbers (offset/jitter/wander) are the feedback signals; synchronization and syntonization are the control objectives. A timestamp looks like a number. That’s why developers treat it like a number. But a timestamp is not “time.” It’s a coordinate in some system — and the most important part of that coordinate system is the timescale: what kind of “time” this number is measuring. If two systems use different timescales, their timestamps can be perfectly well-formed and still be fundamentally incomparable. A timescale is a definition of how seconds are counted and how that count is anchored to reality. A useful way to think about it: So when someone says “store timestamps in UTC,” they’re implicitly making a choice about a timescale — and about how the system behaves during edge cases. TAI is the cleanest mental model for engineers: TAI is what you’d want if your only goal was: a global, steady clock that never inserts weird discontinuities. The downside is social, not technical: people don’t live in TAI. Civil time is defined using UTC. UTC is the time humans and laws use. It is designed to stay close to Earth rotation, which is irregular. To keep UTC aligned with that, the standard allows leap seconds. That single detail has a huge consequence: UTC is not guaranteed to be perfectly continuous. Most of the time, UTC behaves like a normal continuous timescale. But around leap seconds, systems can: So “UTC” is a civil agreement that often behaves like a smooth clock — until it doesn’t. This is why developers eventually run into bugs that sound impossible: GPS time is a common example of a system timescale: And GPS is not alone. Many systems use their own “continuous time” internally because it simplifies math and avoids leap-second edge cases. The important point isn’t the details of GPS time. It’s the category: Many technical systems use a continuous timescale internally and only convert to UTC for humans. At this stage, memorizing “how many seconds UTC differs from TAI” is not the goal. You can look up numbers. The goal is to internalize this: In software, “time” is usually a number plus a contract Once you treat timestamps as “numbers with contracts,” a lot of time-related confusion disappears — and the rest becomes an engineering problem you can actually reason about. Leap seconds exist for a simple reason: Earth is not a perfect clock. Its rotation speed changes slightly due to geophysics, tides, atmosphere, even large-scale events. But civil time is supposed to stay roughly aligned with the Sun (“noon should be around when the Sun is highest”). So UTC is designed to track Earth rotation closely enough — and when the gap grows too large, UTC is adjusted by inserting (and in theory removing) a second. That’s the astronomy story. Here’s the software story: The real problem is not that leap seconds exist. The real problem is that systems don’t agree on how to implement them. If your fleet contains different operating systems, different kernels, different NTP/PTP stacks, different cloud providers, or even different configuration defaults, you will encounter multiple “UTC behaviors” in the wild. That means two machines can both claim “UTC” and still produce timestamps that aren’t directly comparable during a leap-second event. 1) Explicit leap second (23:59:60) Some systems model the extra second as a real extra label in the clock representation. This is conceptually honest: there really is an inserted second. But it breaks assumptions everywhere: 2) Step / repeat / jump Other systems handle the event by effectively repeating a second or stepping the time. From a developer perspective this looks like: This is poison for anything that assumes monotonic behavior from civil time, especially ordering and durations. 3) Smear (stretching time over a window) Instead of inserting a visible extra second, some environments “smear” it: they slightly slow down (or speed up) the clock over a window so that the leap second is absorbed smoothly. This avoids a hard discontinuity, which is great for many systems. But it introduces a different kind of inconsistency: You might think: “Leap seconds almost never happen. Who cares?” Two reasons you should care anyway: Distributed systems amplify rare events. One leap second can create: If your system needs strict timestamp comparisons, you need an explicit policy. Not a vibe. A policy. Do you store time-of-day as UTC timestamps, or as a continuous internal timescale? (Many serious systems store continuous internal time and convert for display.) Do you require “true UTC,” or are you okay with smeared UTC? (If you compare across cloud providers, “okay with smear” might be forced on you.) Where do you do conversions? Store canonical time internally; convert at the edges (UI, reporting), or the other way around? You don’t have to solve leap-second handling in Part 1. That comes later. But you must do the one thing that prevents surprise: Acknowledge that “UTC” is not a single universal runtime behavior. Almost all practical timestamps in software are just a coordinate: “X units since some chosen origin.” That origin is an epoch. The unit is seconds / milliseconds / nanoseconds (or sometimes “ticks”). Together they define the coordinate system your entire platform will live in. Most of the time we treat epoch choice as a boring implementation detail. And it is mostly engineering convenience — until you need long-term compatibility, cross-system integration, or debugging. Then epoch and units suddenly become the difference between “obvious” and “impossible.” An epoch is simply the timestamp you call “0.” Common epochs you’ll encounter: None of these is inherently “better.” They serve different purposes. The important thing is: once you choose an epoch for storage or APIs, you’ve created a contract. Changing it later is like changing the unit of distance in the middle of a highway. A timestamp number is meaningless unless you know its unit. The most common production bug here is not exotic. It’s this: If your codebase uses multiple units, enforce one of these rules: Don’t rely on comments and good intentions. Floats look convenient because you can write 1700000000.123456789. The problem: floating point has limited precision, and the bigger the number gets, the fewer distinct fractional steps you can represent. So you end up silently losing sub-millisecond precision (or worse) depending on magnitude. This is especially important once you claim nanoseconds. If you represent “nanoseconds since 1970” as a float, you’re basically begging for precision loss. Even if you know the epoch and unit, you still need to know what kind of “time” it is. Be explicit about whether a timestamp is: Time-of-day (civil / wall-clock) Anchored to a UTC-like timescale. Comparable across machines if they share the same time source and leap-second policy. Monotonic-ish (elapsed time) Anchored to boot or process start. Great for measuring durations and scheduling. Usually meaningless to compare across machines, and often not meaningful after reboot. Anchored to an ordering scheme (sequence numbers, causality). Comparable for ordering, not for “real-world time.” This is the difference between a useful timestamp and a number that accidentally sorts most of the time. When someone says: “Why is this event from 1970?” it usually means one of these happened: The “1970” symptom is your system screaming: your coordinate system is inconsistent. Protocols and OS clocks become much easier to understand once you separate: In Part 2 we’ll look at where these numbers come from inside one machine, and why “the system time” is actually several different clocks with different guarantees. A lot of engineers treat time zones as UI: “we’ll store UTC and just format it for the user.” That instinct is half right. The other half is where projects die: civil time is not a formatting layer. It’s a set of rules that changes across geography and across history. If your system interacts with humans, payroll, contracts, schedules, billing cycles, or “days,” you are doing civil-time logic whether you admit it or not. Time zones are not just “UTC+2.” They are effectively: So “local time” is not stable unless you store the zone identifier and consult a time zone database for the correct rule at that date. If you store only “UTC offset at the moment,” you lose the ability to reproduce civil time correctly later. Daylight saving time is where naive systems reveal themselves. Ambiguous local time (fall back) The clock is set back. A local time interval happens twice. So a local timestamp like: is ambiguous in many European zones: it could refer to the “first 02:30” or the “second 02:30.” Without extra context (offset or zone rule), that timestamp is not uniquely defined. Missing local time (spring forward) The clock jumps forward. Some local times never happen. So a local timestamp like “02:30” on the DST jump day might literally be invalid: it never occurred. This is why “local time as a primary storage format” is a trap. Your database will happily store impossible moments. Civil time mixes clocks with calendars, and calendars don’t behave like physics. “Add 24 hours” ≠ “add 1 day” During DST transitions, those diverge. Some days are 23 hours, some are 25. So the “next day at 09:00” is not always “+24h”. “Add 1 month” is not a duration at all A month is not a fixed number of seconds. It’s a calendar concept with edge cases: There isn’t one universally correct answer. There are policies — and you must choose one explicitly. Civil-time mistakes usually appear as: The worst part: these bugs often don’t show up in unit tests because tests don’t run across DST boundaries or historical rule changes. A sane default for most systems: Two important additions to make this actually work in real systems: Because recurring human schedules are defined in civil time — and civil time changes. If you internalize one idea from this section, make it this: Local time is not a timestamp. It becomes a timestamp only when you attach a time zone rule set — and accept the edge cases. Even if you had “perfect” clock synchronization, distributed systems still wouldn’t behave like a single machine. Reality gets in the way: This matters because developers use time for two very different purposes: Those are not the same problem. This is the familiar one: attach a wall-clock timestamp to events. Why it’s risky for correctness: Physical time is an approximation. Even with good sync, you can still get: So: wall-clock timestamps are excellent metadata. They are a weak foundation for correctness. Logical time exists because “timestamp ordering” is not the same as “happened-before ordering.” The core idea is simple: A happened before B if A could have influenced B. Not because A’s timestamp is smaller. Logical clocks (conceptually: Lamport timestamps and vector clocks) encode causality: Why it’s not a replacement for wall time: Logical time doesn’t tell you “it’s 12:03.” It tells you “this depends on that.” Most serious systems end up with two parallel layers: This is also how you keep sane during incidents: If you need correct ordering, don’t silently assume wall-clock time gives it. Use wall-clock timestamps for meaning and correlation — but when correctness depends on “what happened first,” you need explicit ordering mechanisms (protocol guarantees, sequence numbers, causality-aware clocks, or designs that don’t depend on global time). Because in distributed systems, “now” is not a global fact. It’s a local opinion. In software, time is a continuously maintained estimate of some reference, expressed in a chosen coordinate system (timescale + epoch + units), and only then mapped into human conventions like calendars and time zones. If that sounds heavier than “a number that increases,” good — because treating time as “just a number” is exactly how you end up with negative durations, duplicated local timestamps, and distributed logs that can’t be reconciled. In the next part we’ll go one layer deeper and get practical: how a single computer actually keeps time — where ticks come from, what the OS does with them, why there are multiple clocks, and why “correcting the clock” can make time jump (and break anything that assumed it couldn’t). If you remember nothing else from this part, remember these: These rules are defaults for most systems. High-precision setups add stricter constraints — we’ll get there when we talk about distributing time across machines. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse - What time is (in software) — what exactly we’re tracking, and what “correct” even means. - How a computer keeps time — where ticks come from, how clocks drift, and why operating systems maintain multiple clocks. - How systems share time — NTP vs PTP, timestamping, asymmetry, and what really limits accuracy. - What can go wrong (and how you detect it) — validation, monitoring, failure modes, and security/trust of time sources. - DST: the same local time can happen twice, or not happen at all. - Time zones: “10:00” without a zone is not a timestamp, it’s a vague sentence. - Clock corrections: timestamps can jump forward/backward when the system is adjusted. - Using wall clock for timeouts → timeout triggers instantly or never triggers after a time correction. - Negative durations (“operation took -3 ms”) because the clock moved backwards. - Inconsistent metrics when different machines have different offsets. - Two machines disagree about “now” → you see “future” events in logs. - Network delay/scheduling jitter reorder events even if clocks are “pretty good.” - You use timestamps to order messages and occasionally violate invariants (“this update happened before its cause”). - You correct offset but ignore drift → you constantly “chase” the reference. - Short-term jitter ruins measurements even if average offset looks good. - You assume nanosecond resolution implies nanosecond accuracy. - You record start = wall_clock_now() - You record end = wall_clock_now() - You compute end - start - Resolution is “how small a step the counter can show.” - Precision is “how finely we can measure and how consistent our measurement results are.” - the reference (UTC? TAI? GPS? a grandmaster clock?), - the path (network delays), - the method (hardware timestamping vs software), - the measurement point (where you observe time). - One stays close for hours. - The other drifts immediately and needs constant correction. - Synchronize = align phase → reduce offset “Make our timestamps match right now.” - Syntonize = align rate → reduce drift “Make our clocks run at the same speed.” - It defines what “one second” means (atomic seconds vs adjusted seconds). - It defines whether the count is continuous or can jump. - It defines how it relates to civil time (what humans call “UTC time”). - it is a continuous count of atomic seconds - it does not have leap seconds - it does not care about Earth’s rotation - repeat a second, - represent 23:59:60, - “Why did the same timestamp appear twice?” - “Why did time go backwards for a second?” - “Why do two machines disagree about UTC during the same minute?” - it is continuous (no leap seconds) - it is used internally by a technical system because continuity is convenient - it has a known offset relative to other scales (like UTC/TAI), but that offset is not “magically applied” everywhere the same way - What is it anchored to? (UTC? TAI? a grandmaster? device-local monotonic time?) - Is it continuous, or can it jump? - What happens during leap seconds? (step? smear? ignore? represent 23:59:60?) - How do we convert it for humans? - parsers that reject :60 - sorting logic that doesn’t expect it - “every minute has exactly 60 seconds” code - timestamps that stop moving forward for a moment - a repeated time value - or “time went backwards” depending on representation - during the smear window, your “UTC” is not exactly UTC - two systems can disagree because one smears and the other doesn’t - comparisons across vendors become tricky (“why are we off by hundreds of ms even though we’re both ‘UTC’?”) - The edge case exists at the standards level, so it shows up in libraries, operating systems, and infrastructure — whether you like it or not. You inherit it. - Distributed systems amplify rare events. One leap second can create: - broken ordering across machines - weird negative durations in logs/metrics pipelines - parsing failures in analytics - incident timelines that don’t line up when you need them most - Do you store time-of-day as UTC timestamps, or as a continuous internal timescale? (Many serious systems store continuous internal time and convert for display.) - Do you require “true UTC,” or are you okay with smeared UTC? (If you compare across cloud providers, “okay with smear” might be forced on you.) - Where do you do conversions? Store canonical time internally; convert at the edges (UI, reporting), or the other way around? - Unix epoch: 1970-01-01 (the usual “POSIX time” family) - System uptime / boot time: epoch = when the OS booted - Process start: epoch = when a process started - Custom epochs: sometimes chosen for storage size, legacy reasons, or protocol specs - seconds (s) - milliseconds (ms) - microseconds (µs) - nanoseconds (ns) - someone sends milliseconds, - someone reads seconds, - everything looks “roughly right” in small tests, - and then you ship timestamps that are off by 1000×. - include unit in the name (timestamp_ms, timeout_ns) - or use a strong type / duration type system - or centralize conversions in one place - store and transport timestamps as integers - attach the unit explicitly - if you need fractions for display, convert at the edges - Time-of-day (civil / wall-clock) Anchored to a UTC-like timescale. Comparable across machines if they share the same time source and leap-second policy. - Monotonic-ish (elapsed time) Anchored to boot or process start. Great for measuring durations and scheduling. Usually meaningless to compare across machines, and often not meaningful after reboot. - Logical (ordering) Anchored to an ordering scheme (sequence numbers, causality). Comparable for ordering, not for “real-world time.” - you interpreted milliseconds as seconds (or vice versa) - you used the wrong epoch (boot-time treated as Unix time) - you parsed a timestamp as local civil time when it was UTC (or vice versa) - you truncated or overflowed (32-bit seconds, wrong cast) - you mixed timescales (rare, but catastrophic when it happens) - Always know your epoch and your unit. If you can’t answer both instantly, you don’t have a timestamp — you have a random number. - Prefer integer storage/transport. - Encode the unit in the API/type/name. - Treat timestamp types as separate domains: wall-clock for human meaning monotonic for durations logical for ordering - wall-clock for human meaning - monotonic for durations - logical for ordering - Never mix different timestamp kinds without explicit conversion and a clear reason. - wall-clock for human meaning - monotonic for durations - logical for ordering - what “time” means (timescale and behavior) - from how it’s encoded (epoch + unit + representation) - a region identifier (e.g., “Europe/Vienna”, not “+01:00”) - plus a historical database of changes: offsets change over the decades DST rules change (sometimes with very short notice) sometimes entire countries switch policy - offsets change over the decades - DST rules change (sometimes with very short notice) - sometimes entire countries switch policy - offsets change over the decades - DST rules change (sometimes with very short notice) - sometimes entire countries switch policy - 2026-10-25 02:30 - Adding 24 hours means “exactly 86,400 seconds later.” - Adding 1 day often means “same local clock time on the next calendar day.” - What is “one month after January 31”? - Is it February 28/29? March 3? “clamp to end of month”? error? - duplicated timestamps in logs (same local time twice) - scheduling drift (“meeting moved by one hour”) - billing/payroll disagreements (“which day counts?”) - impossible events (“this happened at a time that never existed”) - long-term reproducibility issues (“it used to show 10:00, now it shows 11:00 for old data”) - Store and exchange timestamps in a single global standard (typically UTC-like). - Convert to local time zones only at the edges (UI, reports). - Keep calendar arithmetic explicit and isolated (and test DST boundaries). - When civil meaning matters (appointments, payroll, “local midnight”), store the time zone ID (e.g., “Europe/Vienna”), not just an offset. - If you store recurring schedules (“every day at 09:00 local time”), store them as civil-time rules, not as precomputed UTC instants. - network delay (packets take time to arrive), - asymmetry (A→B delay is not necessarily equal to B→A), - scheduling delays (your process didn’t run when you think it did), - partial failures (timeouts, retries, partitions), - and simply different perspectives of “now.” - to attach human meaning (“when did it happen?”) - to decide correctness (“which happened first?”) - humans can read it - audit trails and legal records need it - it’s great for observability (“show me what happened around 12:03”) - it helps correlate events across services when sync is good enough - mis-ordering: event B appears “earlier” than its cause A because A’s message was delayed or A’s clock is slightly behind - future events: logs show something that “happened in the future” relative to another machine - time going backwards locally when clocks are stepped - A happened before B if A could have influenced B. Not because A’s timestamp is smaller. - If B observed A (directly or indirectly), B must be ordered after A. - If two events are independent, they may be concurrent, and ordering them is a policy decision, not a fact revealed by a clock. - correctness in replication, conflict resolution, messaging systems - reasoning about distributed workflows and state machines - avoiding “timestamp lies” when clocks disagree - Physical time for observability, audit, user-facing meaning, “what happened when” - Logical ordering / protocol guarantees where correctness depends on ordering - physical timestamps help humans reconstruct timelines - ordering guarantees help the system stay correct even when time is messy - Decide what problem you’re solving: time-of-day vs duration vs ordering vs frequency. - Store/transport canonical time in one global form (typically UTC-like). - Measure durations with a monotonic clock, not wall time. - Treat time zones/DST/calendar arithmetic as logic, not formatting. - Be explicit about timescale, epoch, and units; prefer integer timestamps. - Assume leap seconds exist — and that “UTC” may be implemented differently (including smearing). - Don’t assume timestamps provide correct distributed ordering. - For serious systems, treat time like a dependency: define trust, monitor offset/jitter, plan failure behavior.