John Mueller of Google wrote an extremely detailed and transparent explanation on Google crawling pages. This reply offers an explanation of why Google (and other third-party SEO tools) do not crawl or index every link or URL on the internet. He said that crawling is not completely objective, it's costly and it is inefficient, the web is often changing, there's junk and spam, and everything must be considered.
Why Google Doesn't Crawl and Index Every URL
John wrote this long response to Reddit to explain, "Why Search Engine Optimization tools do not display every backlink?" But he did so from a Google Search perspective, and here is his explanation.
Google doesn't crawl and index every URL. It's virtually impossible to crawl everything as the number of URLs is virtually infinite. No one can afford to store an endless quantity of URLs in their database. Web crawlers are all based on assumptions, simplifications, and guesses to figure out what is worth crawling.
Even then, to be practical, you won't be able to crawl the whole internet all the time. The internet does not have enough connectivity and bandwidth to support this. It will cost lots of money to visit a large number of pages frequently (for the crawler and the owner of the website).
In the past, certain pages changed rapidly, and in contrast, a few others haven't changed in 10 years. Now, crawlers attempt to cut down on effort by focusing their attention on pages they expect to change instead of those that don't.
We then touch on the area in which crawlers attempt to determine which websites have value. The internet is full of garbage that no one cares about or really useless websites. These pages could still change, and they might have decent URLs, but they are still useless and simply exists without value. Any search engine would ignore them to offer only the best to their users. Sometimes, it's not obvious junk also. Many sites are technically okay, but they're not reaching "the threshold" from an aesthetic point of view to warrant being crawled more.
Is It Theoretically Impossible to Crawl Every URL?
Thus, most crawlers (including SEO tools) work using a simplified list of URLs. They must figure out the frequency of crawling. These tools should understand which URLs to visit more often and what areas of the internet to avoid. There aren't any set rules to follow for any of these, and each tool has to make its own decisions in its work. This is why search engines have different types of content that they index. Due to this, SEO tools display various links, and the metrics built upon the top are different.
There are many other reasons for Google doesn't crawl and index every URL. Highly complex coding language, the site is not focused on user engagement; redirect loops are a few reasons.