Question 1
Can a crawler that only follows hyperlinks identify hidden pages that do not have any incoming links?
Question 2
After obtaining the chunk’s handle and locations from the GFS master, the GFS client (application) obtains the actual file data directly from one of the GFS chunkservers.
Question 3
GFS is a parallel programming framework that allows parallelized construction of the inverted index.
Question 4
HITS and Page Rank only use the inter-document links when calculating a document’s score, without considering the content of the document.
Question 5
Modern web search engines often combine many features (e.g., content-based scores, link-based scores) to rank documents.