GHH is the reaction to a new type of malicious web traffic: search engine hackers. GHH is a “Google Hack” honeypot.
It is designed to provide reconaissance against attackers that use search engines as a hacking tool against your resources. GHH implements honeypot theory to provide additional security to your web presence.
GHH emulates a vulnerable web application by allowing itself to be indexed by search engines. It's hidden from casual page viewers, but is found through the use of a crawler or search engine. It does this through the use of a transparent link which isn't detected by casual browsing but is found when a search engine crawler indexes a site. The transparent link (when well crafted) will reduce false positives and avoid a fingerprint of the honeypot.
The honeypot connects to a configuration file, and the configuration file writes to a log file which is chosen during configuration. The log file contains information about the host, including IP address, referral information, and user agent.
Using the information gathered in the log file, an administrator can learn more about attackers doing reconnaissance against their site. An administrator can cross reference logs and view a better picture of specific attackers.