Lux
AI Research
ResearchFebruary 5, 2026

Why Geospatial AI Agents Are Uniquely Hard to Build

By Ariel Aviv

Over the past two years of building AI agents that design, validate, and optimize fiber-optic networks on real maps, we've learned something that most LLM-agent literature ignores: the physical world doesn't tolerate hallucination.

When a revenue agent drafts a mediocre email, a human catches it. When a coding agent writes a broken function, the compiler catches it. But when a geospatial agent routes a fiber cable through a river, proposes a splice closure floating 40 meters above a rooftop, or assigns 64 homes to a splitter that physically supports 32 — the failure mode is a crew showing up on-site with the wrong materials, the wrong plan, and a six-figure cost overrun.

This post breaks down the five fundamental constraints that make geospatial AI agents a distinct engineering challenge, the design principles we've arrived at after extensive iteration, and the pitfalls we'd warn other teams about before they start.


The Problem Space

Most AI agent frameworks assume the agent operates in a symbolic environment: text in, text out, maybe some tool calls to APIs. The implicit contract is that the LLM's output is either correct or cheaply correctable.

Geospatial agents break this contract in three ways:

  1. Outputs map to physical reality. A node placed at coordinates (31.7683, 35.2137) isn't an abstract data point — it's a cabinet that a construction crew will bolt to a concrete pad at that exact location. Precision isn't optional; it's the product.
  2. Constraints are governed by physics and regulation, not preference. An optical power budget isn't a soft guideline. If your signal drops below −28 dBm at the ONT, the customer gets no service. The agent doesn't get to be “close enough.”
  3. Errors compound spatially. A misplaced distribution closure doesn't just affect one connection — it cascades into every downstream splice, every drop cable, every home served by that branch. One wrong node can invalidate an entire network segment.

This means the standard “generate → review → iterate” loop that works for text-based agents is insufficient. By the time a human reviews a geospatial plan, the interconnected errors may be too deeply embedded to untangle without starting over.


Five Constraints That Change Everything

1. Geographic Reality as a Hard Boundary

LLMs have no spatial intuition. They can generate plausible-sounding coordinates, but they cannot reason about what exists at those coordinates — buildings, roads, rivers, elevation changes, right-of-way restrictions.

Consider a simple task: place a fiber distribution cabinet to serve 12 homes in a residential neighborhood. A human planner looks at the map and immediately eliminates locations that are mid-intersection, on private property, or inaccessible to maintenance vehicles. An LLM generating coordinates will produce a point that is mathematically central to the 12 homes but may land on a highway median.

The key insight is that geocoding is necessary but not sufficient. Converting “123 Main Street” to coordinates solves the addressing problem but not the placement problem. The agent needs access to spatial context — road networks, building footprints, parcel boundaries — and must be constrained to place infrastructure only where it can physically exist.

We've found that the most reliable approach is not to make the LLM “smarter” about geography but to constrain its output space so that invalid placements are structurally impossible. The agent proposes intent (“serve these homes from a cabinet near this intersection”), and a deterministic spatial engine resolves the exact coordinates.

2. Standards Compliance Is Non-Negotiable

Fiber-optic networks operate under international standards that govern everything from cable color coding to maximum splice loss. These aren't suggestions — they're contractual requirements that determine whether an installation passes inspection.

Take TIA-598, the standard for fiber color coding. A 12-fiber cable uses a specific color sequence: blue, orange, green, brown, slate, white, red, black, yellow, violet, rose, aqua. Buffer tubes follow a similar scheme. When an agent assigns fibers in a splice closure, every single assignment must follow this standard, or the technician in the field will connect the wrong fibers and bring down service for an entire neighborhood.

An LLM can memorize this color table. But the challenge isn't memorization — it's contextual application across hundreds of simultaneous assignments while respecting port availability, upstream capacity, and splitter ratios. In a network serving 2,000 homes, the agent might need to generate 6,000+ individual fiber assignments, each one standards-compliant and consistent with every other.

This is where most teams make their first mistake: they treat standards compliance as a validation step that happens after generation. It needs to be an inline constraint during generation. Validating 6,000 assignments after the fact and asking the LLM to “fix” the violations produces worse results than constraining the generation to only produce valid assignments in the first place.

3. Topological Reasoning Over Physical Graphs

A fiber network is a tree. Not metaphorically — it's a literal directed acyclic graph where optical signals split at specific ratios as they travel from source (OLT) to endpoint (home). The agent must reason over this topology to make decisions that are locally correct AND globally consistent.

Here's what this means in practice. A 1:32 split ratio means one OLT port can serve 32 homes. But you rarely deploy a single 1:32 splitter — the standard architecture cascades: a 1:4 splitter at the cabinet level, then 1:8 splitters at closure level, giving you 4 × 8 = 32 homes per OLT port.

When the agent adds the 33rd home to a cabinet's service area, it can't just create another connection. It needs to recognize that the cascade is full, either allocate a new OLT port, restructure the split ratio, or flag the capacity issue. This requires maintaining a running model of port utilization, split ratios, and optical loss budgets across the entire tree, not just the local node.

Key insight: LLMs are poor at maintaining global state across many sequential decisions. By the 200th node placement in a session, the model has functionally lost track of capacity decisions made at node 15. The solution isn't a bigger context window — it's externalizing the topological state into a structured store that the agent reads from and writes to at each decision point.

4. Optical Physics Sets Distance Limits

Every meter of fiber, every splice, every connector, and every splitter introduces optical loss. The signal leaving an OLT has a fixed power budget — typically around 28 dB for a GPON system. Every component in the path consumes some of that budget:

ComponentTypical Loss
Fiber attenuation~0.35 dB/km at 1310nm
Mechanical splice~0.1 dB each
Fusion splice~0.05 dB each
1:8 splitter~10.5 dB
1:32 splitter~17.5 dB
Connectors~0.5 dB each

When the total path loss from OLT to ONT exceeds the budget, the connection fails. This means the agent can't just draw the shortest-path cable route between two points — it needs to calculate cumulative loss along the entire optical path, including every splice closure, every splitter, and the cable distance at each segment.

A network that looks perfect on the map — clean topology, efficient routing, correct split ratios — can be physically undeployable if the most distant home exceeds the optical budget by 2 dB.

We've learned that optical budget validation must run continuously during design, not as a post-hoc check. Every time the agent places a node or routes a cable, the budget for all affected paths needs to be recalculated. This is computationally cheap (it's arithmetic) but architecturally demanding — it means the agent needs access to a running loss model of the entire network at every decision point.

5. The Coordinate Precision Problem

In text generation, the difference between a good and great output is qualitative. In geospatial work, it's often a matter of decimal places.

At latitude 31° (roughly Tel Aviv), one degree of longitude equals approximately 95 km. Six decimal places of precision give you ~0.11 meters — enough to distinguish which side of a wall a cable enters. Five decimal places give you ~1.1 meters. Four gives you ~11 meters.

This matters because the agent frequently needs to perform spatial operations: calculate cable lengths between nodes, determine if a home falls within a closure's service radius, or check if a proposed route crosses a restricted area. Small coordinate errors accumulate across these operations.

The failure is subtle. A cable length calculated from 5-decimal-place coordinates might read 342 meters when the actual distance is 347 meters. That 5-meter discrepancy propagates into the optical loss budget, the bill of quantities, and the material procurement list. Multiply this across 500 cable segments and you've introduced systematic error into the entire project.

The principle we've adopted: never let the LLM generate raw coordinates. All coordinate data originates from a geocoding service or a user click on the map. The LLM references locations by semantic identifier (“the cabinet near Herzl and Rothschild”), and deterministic systems resolve these to precise coordinates.


Design Principles That Emerged

After building through multiple iterations of our geospatial agent system, five principles crystallized:

1. Constrain, Don't Correct

It is significantly more reliable to prevent the agent from producing invalid outputs than to detect and fix violations after the fact. This applies to every constraint category: standards compliance, topological validity, optical budgets, and coordinate precision.

In practice, this means the agent's tool set is designed so that calling the tools correctly guarantees a valid output. The tool to connect two nodes checks capacity before creating the connection. The tool to assign fibers enforces TIA-598 ordering internally. The agent doesn't need to “know” the standard — the tool embodies it.

2. Externalize All State

The LLM's context window is not a database. Any state that needs to persist across more than a few operations — port allocations, fiber assignments, cumulative optical loss, cable inventories — must live in an external structured store that the agent queries explicitly.

This seems obvious in retrospect, but the temptation to let the LLM “remember” decisions is strong, especially early in development when networks are small and the context window can technically hold the entire state. It breaks catastrophically at scale.

3. Separate Intent from Execution

The agent should express what it wants to do, not how to do it at the coordinate and data-structure level. “Connect these 8 homes to this closure using a 1:8 splitter” is a valid agent output. The coordinates, fiber assignments, splice schedules, and bill-of-quantities entries are computed deterministically from that intent.

This separation is what makes the system auditable. A human reviewer can read the agent's intent log and understand the design decisions without parsing thousands of coordinate pairs.

4. Validate Continuously, Not Finally

Every operation should leave the network in a valid state. This is the geospatial equivalent of keeping your tests green after every commit. If you defer validation to the end, you'll face a combinatorial explosion of interacting violations that are nearly impossible to debug.

We enforce this by making each tool call atomic and validated: the network is consistent before the call and consistent after. If a tool call would create an invalid state (exceeding port capacity, violating optical budget), it fails immediately with a specific reason, and the agent can adjust its approach.

5. Domain Knowledge Lives in Tools, Not Prompts

Early on, we tried embedding FTTH expertise into the system prompt — pages of standards documentation, splitter specifications, cable types. This produced agents that could discuss fiber optics fluently but still made fundamental design errors.

The shift that worked was moving domain knowledge into the tool implementations. The agent doesn't need to know that a 1:8 splitter introduces 10.5 dB of loss — the optical budget tool knows this and enforces it. The agent doesn't need to memorize TIA-598 — the fiber assignment tool implements it.

This dramatically simplified the prompt (better reasoning) while making domain compliance structural rather than probabilistic.


Common Pitfalls and How to Avoid Them

1. Treating the Map as a Canvas

The most natural mental model is that the agent “draws” on a map — placing nodes and drawing lines. This leads to architectures where the LLM generates geometric primitives (points, polylines) and the system renders them.

The problem is that maps aren't canvases. A point on a map has context: what's underneath it, what zoning applies, what existing infrastructure is nearby. An architecture that treats placement as coordinate generation will produce designs that look correct on screen but fail in the field.

Mitigation: Model placement as a search problem (find valid locations), not a generation problem (create coordinates).

2. Over-Relying on the Context Window

For a small network (50 homes, 5 closures), everything fits in the context window and the agent appears to reason correctly about topology. Teams ship this, then discover at 500 homes that the agent is making contradictory decisions because it's lost track of assignments made earlier in the conversation.

Mitigation: Design for external state from day one, even if the prototype doesn't need it. The architectural cost is low; the refactoring cost later is severe.

3. Conflating Precision with Accuracy

An agent that outputs coordinates to 8 decimal places looks precise. But if those coordinates came from the LLM's parametric knowledge rather than a geocoding service, they can be meters or even kilometers from the actual location — while looking authoritative.

Mitigation: Treat any coordinate not sourced from a geocoder or explicit user input as untrusted. Flag and reject LLM-generated coordinates by default.

4. Validating Components Instead of Systems

A node passes validation. A cable passes validation. A splice assignment passes validation. But the network formed by combining them violates optical budgets, because no individual check verifies end-to-end path loss.

Mitigation: Always validate at the path level, not just the component level. After each modification, verify that all affected paths (from OLT to every downstream ONT) remain within budget.

5. Ignoring the Sequence of Agent Decisions

The order in which an agent places infrastructure matters. Placing the OLT first and working outward (top-down) produces different networks than starting from homes and working inward (bottom-up). Neither is universally correct, but mixing strategies mid-design produces inconsistent results.

Mitigation: Establish a clear traversal order and enforce it. Our most reliable results come from a bottom-up approach — cluster homes first, then place closures to serve clusters, then connect closures to cabinets, then connect cabinets to the OLT. This ensures that every upstream node is sized and placed based on actual demand rather than estimated demand.


When These Lessons Apply

These constraints are not unique to fiber-optic networks. Any domain where AI agents generate outputs that map to physical systems will encounter some combination of:

  • Hard physical constraints (energy grids, water distribution, transportation networks)
  • Regulatory standards (building codes, environmental regulations, safety certifications)
  • Graph-topological reasoning (supply chains, logistics networks, utility systems)
  • Precision requirements (surveying, manufacturing, robotics path planning)
  • Cumulative error propagation (any system where individual outputs are interconnected)

If your agent's mistakes cost more to fix in the physical world than the agent saved in the digital one, the principles in this post apply.


Summary

Geospatial AI agents occupy a specific and underexplored region of the agent design space. The constraints are harder, the failure modes are costlier, and the standard playbook of “generate, validate, iterate” is insufficient.

The principles that have survived production for us reduce to this: make the LLM do less, not more. Let it express intent and make design decisions. Let deterministic systems handle coordinates, standards compliance, optical physics, and state management. The agent's value isn't in knowing that a 1:8 splitter introduces 10.5 dB of loss — it's in knowing that this neighborhood needs a 1:8 split based on the density, distance, and demand pattern.

Build the tools so that using them correctly is easier than using them incorrectly. Externalize every piece of state the agent needs to reference across operations. Validate continuously. And never, under any circumstance, let the LLM freehand coordinates on a map.