A study that the web's framework reveals that it isn't the fully interconnected network that we've been led to believe. The study argues that the chance of being able to surf between two randomly chosen pages is much less than one in four.

If we consider web pages as vertices and hyperlinks as edges. Then, the web have the right to be stood for as a command graph. Now, the inquiry is what does the net graph look like?

In 1999, after ~ the Web had been farming for the better part the a decade, Andrei Broder2 and also his colleagues collection out to develop a worldwide map of the Web, making use of strongly connected components as the building blocks. They analyzed the data from one of the largest commercial search engines at the time, AltaVista3.

Their finding included that the web consists of a huge strongly connected component. In situation you don’t know, a strongly associated component is a region from which girlfriend can obtain from any allude to any other point along a command path. So in the paper definition of the net graph, with this giant SCC, what this means is that from any type of webpage within this blob, you can get to any other webpage within this blob, simply by traversing a sequence of hyperlinks.

The two areas of about equal dimension on the 2 sides of core are called as:

IN: nodes that deserve to reach the gigantic SCC yet cannot be got to from that e.g. Brand-new web pages, and OUT: nodes that deserve to be reached from the huge SCC but cannot reach it e.g. That company websites.

This structure of net is recognized as the Bowtie structure.


There are pages the belong to none the IN, OUT, or the gigantic SCC i.e. They deserve to neither with the giant SCC nor be got to from it. These are clasified as:

Tendrils: the nodes reachable indigenous IN that can’t reach the large SCC, and the nodes that deserve to reach OUT yet can’t be reached from the gigantic SCC. If a tendril node satisfies both problems then it’s component of a tube that travels native IN to the end without poignant the gigantic SCC, and Disconnected: nodes that belong to none that the ahead catogories.

The study gathered the following data:

framework Altavista, may 1999
Total 203.5 million
SCC 56.5 million
IN 43.3 million
OUT 43.1 million
Tendrils 43.7 million
Links 1466 million (7.2 every page)
and others  

Taken together a whole, the bow-tie structure of the Web gives a high-level view of the Web’s structure, based upon its reachablility properties and how the strongly connected components to the right together.