Gary Illyes’ Post

My mission this year is to figure out how to crawl even less, and have fewer bytes on wire. A few days ago there was a post on a Reddit community about how, in the OC's perception, Google is crawling less than previous years. In the grand scheme of things that's just not the case; we're crawling roughly as much as before, however scheduling got more intelligent and we're focusing more on URLs that more likely to deserve crawling. However, we should, in fact, crawl less. We should, for example, be more intelligent about caching and internal cache sharing among user agents, and we should have fewer bytes on wire. If you've seen an interesting IETF (or other standards body) internet draft that could help with this effort, or an actual standard I might've missed, send it my way. Decreasing crawling without sacrificing crawl-quality would benefit everyone.

Lucas Barnes

E-commerce | DTC Marketing Strategy | PNW Web Marketing

1y

Are you the person I have to blame for pumping tons of excess traffic to Reddit? My eyes are bleeding from the current search results.

Liam Fallen

I work in marketing.

1y

"My mission this year is to figure out how to crawl even less"... Have you tried walking mate?

JR Oakes

Code & Growth 🐍🍺

1y

If I read it correctly, Chunked Oblivious HTTP Messages from Feb seems interesting.

David Garcia

SEO Systems Engineer specializing in ChatGPT and Generative AI

1y

Apologies if you're already using this tactic, but what if you hashed or coded the data before transmission? So a specific hash = data/information which can be decoded on the other side of the wire. If you have a cache for the current state of a web page, and say 5% of it is changed, send the hash for the changed portion only.

S M Shakil Ahmed

SEO Consultant | Local Growth Expert | Amplifying Online Presence

7mo

You deserve a noble prize for saving energy by crawling less.

Like
Reply
Gary Sognon

Responsable de Communication chez MEDIA SUR 7

11mo

Hello, thank you for the clarification, but I have a problem to share with you, because I can no longer solve it. foot-sur7.fr is one of the main information sites on the transfer window. For no reason, last September we lost over 70% of our traffic. Our information is filtered in Google news with a strong reduction in its reach also in search results. I called on SEO colleagues with a higher level of expertise, there was no way to boost our traffic. We are in difficulty and I don't know what to do anymore. Do you have any ideas to suggest to me even though I know this is not the appropriate place for this type of request?

Like
Reply
Dave Smart

Technical SEO at Tame the Bots

1y

I notice in crawling that 304 seems to be more a thing for static assets, especially images, than pages as such, many of the requests coming from googlebot don't seem to carry If-None-Match or If-Modified-Since Feels like a good way to reduce overall crawl overhead (albeit you're going to at least need to request and receive the headers) I'm sure there's probably plenty of heuristics collected that perhaps show folks get it wrong more than right though? I also guess things like client side rendered sites make it hard to impossible for webmasters to implement that on web pages. But it does seem like a missed opportunity for sites that can work well with it.

Dan Taylor ♒

Enterprise SEO Consulting | Partner & Head of Technical SEO at SALT.agency | 2018 TechSEO Boost Innovation Prize Winner

1y

This. I've seen the same Reddit posts. I think how it's a perception with their scope, but on a wider scale it makes sense to prioritize resources across discovery and refresh categories. I also feel, FWIW, Google (et al) have been reallocating resources this way for some time now, I wrote about it back in 2022. Also makes sense why some SEs have adopted index now. Less resource to receive submissions than it is to discover.

Mike Hardaker

CEO of Mountain Weekly News | Founder Jackson Estate Finds | Owner Barron's Leash Boutique Dog Sitting Service | Host of the Everything Snowboarding & More Podcast

11mo

Gary are crawl budgets limiting the amount of images in Google search now? Instead of showing all the images we shot of next years snowboard with a professional photographer at Jackson Hole Mountain resort is seems Google has decided to now only show a featured image in Google Image Search. I attached one that got left out of the 10 stunning images, I published Google only showed 1. Here is the kicker Gary Illyes I am the ONLY person in the country to have tested the board. Even if quoting out the search reveals horrible results. https://www.google.com/search?q=%22Ride+Moderator%22&tbm=isch&ved=2ahUKEwin54yo4bqFAxXVx8kDHYN0DSwQ2-cCegQIABAA&oq=%22Ride+Moderator%22&gs_lp=EgNpbWciECJSaWRlIE1vZGVyYXRvciIyBxAAGIAEGBgyBxAAGIAEGBhIpyxQxgVY8ypwAHgAkAEAmAF3oAGsBKoBAzAuNbgBA8gBAPgBAYoCC2d3cy13aXotaW1nwgIJEAAYgAQYGBgKiAYB&sclient=img&ei=xycYZufnO9WPp84Pg-m14AI&bih=703&biw=1536&prmd=sivnmbtz My content site testing outdoor gear personally in the Tetons has lost 98% of organic search traffic from Google. No manual penalties, been here since 2008. Industry leader, now your wanting to remove my content essentially. It's really getting tiresome. At some point I will stop publishing.

  • No alternative text description for this image
Like
Reply

I've been thinking about how we can do more with less. Not with search, but with other types of site analysis. We've had the luxury of treating the internet as infinite for most of its existence. That's type of thinking is just not something that our planet can sustain. Building on the sitemap.xml files at least provides a framework where the site owner and the SEO company can work collaboratively to identify when new content has been added to the site.

See more comments

To view or add a comment, sign in

Explore topics