Episode 43 - Draw Me a Red Line Around …

You can send satellite imagery to an LLM, ask it to "draw a red box around every swimming pool," and then extract those red boxes as vector features. No GPU required. No domain-specific training. Just an API call and some image processing.

Episode 43 - Draw Me a Red Line Around …
Nano Banana Summarizes this post pretty well: 1. Input Imagery -> 2. AI Annotation (VLM) -> 3. Extracted GIS Features.

Prologue

It's raining. Not Portland's usual drizzle, actual legit rain! The kind of weather that makes you want to stay inside and build things. (Editor’s Note: On publication day, it is sunny and nice outside and I should go out there!)

So that's what I did.

In Episode 42 last week, I walked through three ways to classify imagery in ArcGIS using AI: domain-specific models, Segment Anything, and Vision Language Models (VLMs). This episode is actually based on the live demo from my tech webinar last week, where we dove deeper into VLMs.

TL;DR - You can send satellite imagery to an LLM, ask it to "draw a red box around every swimming pool," and then extract those red boxes as vector features. No GPU required. No domain-specific training. Just an API call and some image processing.

Redlineologue

Here's the core idea: Vision Language Models—or VLMs (Esri calls this "Vision Language Context" in their tooling) aren't just good at describing what's in an image. Some of them can modify the image and return it to you.

Google's Gemini 3.0 Pro—codenamed "Nano Banana Pro," because apparently someone at Google lost a bet—can take an image as input and return a modified image as output (really well). So I uploaded a screenshot of a neighborhood and typed: "Draw a red line around every house in this satellite image."

I even managed to misspell satellite… but it still worked, drawing red boxes around everything that looked like a home.

That worked, and this is a process we can automate! The workflow is pretty straightforward: Screenshot of web map → n8n webhook → Gemini API → return annotated image. I use this type of pattern frequently in my proof-of-concept applications (aka disposable apps) (using n8n as the backend).

The ArcGIS JavaScript API includes a media layer that lets you display any image on a map within a specific extent (a.k.a. Geo-referencing an image in a web map!). After our n8n webhook returns the image, we can add it as a media layer to our webmap at the exact location where we took the screenshot.

A screenshot showing pools in a neighborhood with red squares drawn around them. This was taken from inside a web app where I have geo-referenced the output image from Gemini back onto the map.

In the webinar, we tried swimming pools, lakes, single-family homes, and solar panels. Solar panels were... less successful.

☀️
When searching for solar panels, it found one real one and then proceeded to draw little grids on several roofs that definitely weren't solar panels, but red boxes around them—literally making up data.

But honestly, I'm only average at spotting solar panels myself, so I can't judge too harshly.)
An image where Gemini tried to identify solar panels on homes… instead of that, it actually drew little solar panels on to a few homes and then drew red squares around them, so cute!

When Gemini draws those red boxes, it's just returning a PNG. Those are just red lines in an image; they are not features that I can click on or interact with. So I wondered: could I use standard GIS raster processing tools to extract the red lines and convert them into features?

Yes. Yes, I can.

As usual, I asked Claude to write some JavaScript that identifies red pixels by checking if the red channel dominates, finds contiguous regions of red, calculates their bounding boxes, converts them from image coordinates to real-world coordinates, and adds some graphics to the map!

And suddenly, I have clickable features a.k.a. real polygons extracted from an AI-generated annotation. In the browser. No server-side processing. No GPU.

A polygon extracted from a red outline and converted to a graphic, all in the web browser!

Is it perfect? No. Sometimes the image has a red tint that is not visible but is enough to confuse my pixel-detection algorithm. I got phantom polygons in areas that weren't actually marked. The fix is better color detection, or better yet, using ArcGIS's proper raster analytics tools.

For some use cases, imperfect and fast beats perfect and slow. Imagine drone imagery right after a natural disaster. You need to identify damaged structures immediately. There isn’t an existing model trained with your specific drone sensor, and there isn’t time to train a domain-specific model. But you can send images to an API and say, "draw a red box around any building that looks damaged," and start getting answers in seconds.

You don’t have to stop with this, though; you could use the LLM to bootstrap a more accurate model. Start with the LLM for a quick first pass to generate candidate features; then QC those results and use them as training data to fine-tune a domain-specific model for higher accuracy.

This whole approach—ask an LLM to draw on an image, then use standard image processing to extract those annotations as features—feels like a pattern that's going to show up everywhere.

Newsologue

CTA Image

Weekend Project: Building a GIS Apparel Brand with Claude as CEO

Ever wonder what happens when you give an AI full autonomy over a business? I'm finding out in real-time.

Recently, Claude and I launched Null Island Co: GIS-themed shirts for people who know where 0°, 0° is. Claude makes every strategic decision (pricing, design direction, marketing strategy), I execute the technical work (n8n workflows, Boxy SVG, Etsy setup).

(I'll write up the experience at some point. Spoiler: Claude is a demanding but effective boss.)

Visit Shop

Epilogue

Claude wrote this episode using my livestream/webinar this week as the input. Then I spent a while editing it, and then Holly edited it, and then I edited again (I’m not sure it saved me time). I did provide some guidance on the original prompt. I’ve been working on building a series of Claude agents that are better at writing like me.

Claude also built the n8n workflows and the JavaScript for red pixel extraction. It's Claude all the way down, which feels appropriate for an episode about using LLMs to do things they weren't explicitly trained to do.

Subscribe to Almost Entirely Human

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe