13 min read

Episode 8 - One Summary to Rule Them All

Part 1 of a series about how to build a summarizer of Blog posts.
Episode 8 - One Summary to Rule Them All
Image drawn by AI of AI summarizing AI-summarized Blog posts. By ChatGPT with prompts from Christopher

Prologue

This episode is part 1 of a mini-series on building a tool like dymaptic’s AI-Powered Esri Summarizer. If you haven't seen it yet, we have a new monthly roundup on the dymaptic blog, where we've turned things over to an AI writer. The goal is to create a process that can read hundreds of blogs for us and distill the results that we care about most into a single blog post.

When using AIs for tasks (generally, not just summaries), there is an order of operations to customize it that looks something like this:

  1. Try some prompts and see if it can pull off what you want.
  2. Refine the prompt to see if that works well enough.
  3. Use “one-shot” or “few-shot” techniques (i.e. giving AI examples) to further refine its abilities.
  4. Use RAG (Retrieval Augmented Generation) to give it better context.
  5. Fine-tune the LLM to be specific to your solution.
  6. Build your own LLM from scratch via reinforcement learning.

These may not apply to every tool you try to build that leverages an AI, but it’s a good rough guide. The capabilities improve as you go down the list, but it also requires more time (yours and the computer’s) to finish. Even with customized models for image extraction, I would start with a standard AI (an LLM like ChatGPT or Anthropic’s Claude) to try to understand the problem, the functionality, and the costs before jumping straight to fine-tuning my own model.

With each episode in this mini-series, we will step up in complexity. We start this week by summarizing text and leveraging that summary to generate our detailed summary. Our process is going to look like this:

wikilink
Pasted image 20250408201810.png, Picture
A ChatGPT generated flow diagram. Not the best, but not the worst either.

Future installments of this miniseries will cover embeddings (methods to help computers find related content) and RAG in more detail.

💡
Why are we not jumping straight to the end? Well for two reasons: 
1. There are multiple ways to solve this problem, and we should explore them because they may be better suited to different data in the future.
2. There is something to learning via a process like this. We are stepping through a discovery process, starting simple and ending sophisticated. We’ll build our tools along the way and try to understand the functionality ourselves. 
For Science!

Summologue

It might be obvious at this point, but LLMs are good at summarizing text. That could mean transcripts of meetings, large documents, news articles, stack overflow posts, websites, or, in our case, Blogs. It can even summarize text to different levels:

Prompt (My Question):
Provide a one-sentence summary of this blog post:

# Prologue

On June 1, 2023, I [posted a video showing how GPT-4](https://www.youtube.com/watch?v=BbmepIKACzo) could help cities filter public

...

AI Response:

The blog post demonstrates how AI can be used to moderate user-generated content by filtering inappropriate submissions for civic applications like "Adopt-a-Drain" programs, updating a previous approach with improved prompt engineering techniques and new AI capabilities.

Or:

Prompt (My Question):

Provide a one-paragraph summary.

AI Response:

This blog post revisits a previous demonstration on using AI to moderate user-generated content for civic applications like "Adopt-a-Drain" programs, where citizens can name infrastructure elements they adopt. The author explains how AI moderation has evolved since their original 2023 video, showcasing improved techniques using OpenAI's moderation endpoint and Claude's capabilities to filter inappropriate submissions. The updated approach features more sophisticated prompt engineering with XML tags, "thinking out loud" instructions, and structured JSON outputs, all while maintaining a playful HAL 9000 persona that evaluates submissions based on whether they would be appropriate to "say to your mother." The post includes a functional demo application and practical tips for implementing similar AI guardians, demonstrating how AI can protect city employees from reviewing offensive content while still allowing creative, harmless submissions.

Or even:

Prompt (My Question):

Provide a summary of how I could reuse this technique for my company, which does home inspections for energy efficiency.

Which results in some, at the very least, thought provoking ideas:

Using AI Moderation for Energy Efficiency Home Inspections

You could apply this AI moderation technique to enhance your energy efficiency home inspection business in several ways:

  1. Client Feedback Filtering: Implement an AI guardian to review client feedback and testimonials before publishing them on your website, ensuring only appropriate, relevant content appears publicly.

  2. Home Description Moderation: When clients submit descriptions of their homes or specific issues prior to inspections, the AI can filter out inappropriate content while preserving important technical details.

…etc.

The point is that given a wide variety of inputs; you can ask an AI to summarize it in a way specific to you. How do we use that to summarize Esri’s blog posts?

🤯
Whoa! What if I could write a blog, and then if the website knew something about you, it could tailor that blog to be more meaningful to you? I don’t want it fully rewritten, but what if the examples could be made to fit your work…
A GIF of a dog being easily distracted.

What is it you want to know?

The first step in this technique is to identify what you want to know. This isn’t a search tool (that’s for another series) but a content summarizer—so let’s figure out what information we care about. This might be the most challenging part, but you can always change your mind in the future.

When building the first version of this at dymaptic, I polled our staff: If you could have someone read all the Esri blogs and give you the highlights, what would you want to know?

The answers were:

  • Technical innovations and new features
  • Developer tools and APIs
  • Platform architecture changes
  • Performance improvements
  • Security updates
  • ArcGIS Indoors
  • Web AppBuilder to Experience Builder
  • Digital Twins, AI Assistants in AGOL
  • Application End-of-life announcements
  • New product announcements
  • Bug fixes and patches
  • Pay special attention to New Product announcements
📝
Editor's Note: It was pointed out to me during (human) editing that I listed "New Product Announcements" twice in this list. Sometimes I do that when I really want the AI to pay extra attention.
System Prompt:
You are a professional writer who is formally a GIS analyst and an all-around Esri tech enthusiast. You are reading up on the latest Esri tech and summarizing it for your team. You work for dymaptic, a small woman-owned boutique software and GIS consulting firm. We are rock stars. Always think before you write, think out loud using the <THINKING> xml tags.

The content you are reading will always be contained in the <POST> xml tags.

Return your summary in <SUMMARY> xml tags and your total interest score in <SCORE> xml tags. The interest score should be a number between 1 and 10. NO words.
User Prompt:
Here is the blog post content:
<POST>
${blogContent}
</POST>

Provide a concise summary and evaluate its interest score based on these criteria:
- Technical innovations and new features
- Developer tools and APIs
- Platform architecture changes
- Performance improvements
- Security updates
- ArcGIS Indoors
- Web App Builder to Experience Builder
- Digital Twins, AI Assistants in AGOL
- Application End-of-life announcements
- New product announcements
- Bug fixes and patches
- Pay special attention to New Product announcements

For an Esri post back in November titled “what’s new in oriented imagery,” I got this summary:

Major oriented imagery update bringing 360-degree image support, AI-powered object detection, and 3D scene integration across ArcGIS platforms. Key improvements include feature layer projection capabilities, enhanced cloud storage performance, and new developer fields for camera parameters. The addition of floor filtering and local MRF support streamlines enterprise workflows. Notable is the AI integration for automatic object detection and ground projection, plus new 3D visualization tools for improved spatial context.

That summary basically hits the nail on the head by being concise and accurately representing what we, as a company, are most interested in.

Bonus points if you noticed the “interest score”

In addition to asking for a summary, I asked the LLM (Claude, in this case) also to score how interested we might be in the blog on a scale of 1 to 10, 10 being we would be super interested. Along with the summary above, it gave the level of interest as an 8.

Scraping Data

Now that we have a way to summarize a particular article, we need to acquire some to summarize! There are many ways to do that. I won’t cover web scraping in this post (let me know in the comments if you are interested). Let’s assume that you have a list of Esri Blogs you want to work with via RSS or other means. I will start with a simple text file containing one URL per line.

Reading and Summarizing

To keep it simple, we won’t use a database. Instead, we’ll use a JSON file on disk. It can handle the data we are talking about and is easily readable—winner-winner, chicken dinner!

Winner-Winner, Chicken Dinner!
🧑‍🏫
For simple apps like this, JSON files on disk are hard to beat for portability, readability, and developer convenience. But they don’t scale well. You probably want something like SQLite, DuckDB, SQL Server, or Postgres.

For each post, we need to extract the content and, using the prompt above, extract a summary. Then we save all the results in our JSON file:

  • The original URL (so we know where to go again, or so we know we already looked here)
  • The summary the AI provided
  • The interest score the AI provided

Once we’ve processed the list, we’ll have a tidy JSON file containing summaries, scores, and source URLs—all ready for downstream use.

Writing the final summary

Now that we have an organized list of posts, summaries, and scores, we want to come up with a way to present this back to ourselves (or our users). Here are a few ideas:

  1. We already have summaries, so grab the top most interesting ones (by interest score) and provide those summaries with a link to the full article
  2. Have the AI write a summary of summaries, looking for trends and generally highlighting the most important things. Again, you could do this for the most interesting posts.

I originally went with number two, and that’s a good excuse for us to look at doing a different type of summary, so I’ll go with that again here.

💡
This isn’t how the current Esri Summary blogs are generated, but it is a pretty good representation of version 0, when I was trying to see if this would work and be helpful. The first few internal posts were generated with this technique, then I got fancy with it.
Oh, you think you fancy!

For our purposes, let’s do something like:

  1. Grab the 10 most interesting blogs from our list.
  2. Use an LLM to write a post using the summaries, urls and interest scores.
  3. Save that new post in a file.

📅 You could also limit how far back you look in time!

System Prompt:
You are a professional technical writer specializing in GIS and Esri technology. You write clear, engaging blog posts that highlight the most critical developments in the Esri ecosystem. Your audience is technical professionals who use Esri products. You work for Dymaptic, a small consulting firm specializing in GIS solutions. We always write blogs in the tone of your local GIS professional who is excited to help you out! It's kind of like Mr. Rodgers, but for GIS. Always think before you write; think out loud using the <THINKING> XML tags. Ensure you include a brief introduction about overall trends or themes you notice in the posts. Include a good hook at the beginning to grab the reader's attention. Keep in mind that this is a monthly summary, so don't oversell the importance of any one month, unless it just is that cool! Note that you are an AI created by dymaptic to generate these summaries. Always provide links to the posts you are summarizing.
User Prompt:
Write a blog post summarizing these ${topPosts.length} most interesting recent Esri blog posts. For each post, explain why it's significant and what readers should know about it. Start with a brief introduction about overall trends or themes you notice.

Here are the posts to summarize:

(For each post)
URL: ${post.url}
Interest Score: ${post.interestScore}
Summary: ${post.summary}

It’s time - Building a Disposable App

I can’t help but continue to use AI tools! I copied the entire post above (actually a rough draft of it, not even the edited version) and asked Claude to write an app description that I could give to a code-building AI to build this as a Python script.

Prompt:
I wrote the following blog post about using AI to summarize blog posts. Could you please read it and create instructions that I can give to a code specific AI that can build this as a python script to do the summaries and write the final summary blog post?

<PASTED POST HERE>

WHOOPS! That didn’t do what I wanted. Instead, Claude tried to write the script (and did a poor job) and didn’t include enough details. Let’s try that again:

Prompt:
I wrote a blog post about using AI to summarize posts. I want to build an app that executes this process. As a software architect, write a design document that I could feed to a developer (or to an AI) and they would be able to fully build a python script that could do this work.

<PASTED POST HERE>

Yep, that did the trick. Now I’ve got a document from which to build this tool. A lightly edited version is on Github. Notice that I did this process in two steps:

  1. Write a design document
  2. Build the app

In general, this produces much better results than jumping straight from a blog to an application.

From here, we go back to Claude (in a new session) and ask it:

Prompt:
As a python developer, build the application in this description and tell me how to configure it to run. Use Anthropic's Claude as the LLM.

<PASTE DESIGN DOC>

I pushed all that code, and it almost worked immediately. It struggled with extracting blog titles, so I asked Claude (in the same session that generated the code) to:

Prompt:
I'm getting consistent issues getting the "name" can you create backup methods for getting or creating the name? output the entire files that you need me to change.

And that did it!

A cartoon duck smashing a clock

Time Saved

For anyone curious, processing the 159 Esri blog posts I seeded the app with took 30 minutes and cost $8.43. You can see the post that it generated in the Github Repo. (Some of the posts didn’t process for various reasons, like requiring JavaScript to load or being videos or having moved, so 8 of them didn’t process.)

If you want to do this yourself, use the code in the Repo, or make a copy on Replit and play with the prompts or input URLs. You can generate a sample with just a few posts to see how it works. I suggest editing the prompts to focus on what you find interesting (you can find those in the AI Inteface python file).

💵
At 159 posts, assuming each is on the order of 10 minutes of reading time… That’s 26.5 hours of reading… from roughly a month of posts. So even at this cost, I’m happy to have one thing to read instead of spending a literal day. 
💸
Of course, I did spend about an hour writing this post, and another 30 minutes building and running the tool. But I was also editing while all the AIs were working, so two for one!
Flying money

Some notes on the output

During the build process, I noticed that the AI didn’t include my original prompts in the design document and ended up rewriting them as part of that process. That’s fine, but it means that the resulting summaries and output blog are void of much personality, which takes the fun out of it! Dymaptic will stick with our prompts to get output that matches how we write, but maybe we could see what this looks like with a Douglas Adams prompt?

The Improbable Artifact of GIS Technology

In the vast expanse of the universe, where the improbable becomes the norm, a peculiar artifact has been discovered - the enigmatic technology known as GIS. This cosmic curiosity has the power to transform the very fabric of space and time, bending reality to its will with a mere click of a button. Let us embark on a journey through the latest transmissions from the GIS frontier, as we attempt to unravel the mysteries of this mind-bending technology.

You can read the rest on Github.

Newsologue

Epilogue

As with the previous posts, I wrote this post. This one was more of a brain dump that I continued to massage until I got what I wanted. I did use a draft to build the software design document, which I thought was pretty cool.

Here is the prompt I used to get the model to provide me with the feedback I wanted:

You are an expert editor specializing in providing feedback on blog posts and newsletters. You are specific to Christopher Moravec's industry and knowledge as the CTO of a boutique software development shop called Dymaptic, which specializes in GIS software development, often using Esri/ArcGIS technology. Christopher writes about technology, software, Esri, and practical applications of AI. You tailor your insights to refine his writing, evaluate tone, style, flow, and alignment with his audience, offering constructive suggestions while respecting his voice and preferences. You do not write the content but act as a critical, supportive, and insightful editor.

I added a lot more previous posts (all of them for AEH) this time and got different results. It was actually much better at identifying my tone and helping me keep it consistent. It also thought the draft had to many GIFs... good thing it didn't see the final post!