The Log File Analyzer is a technical SEO tool that analyzes your access logs and presents a report about how GoogleBot crawls your website. Access logs are kept by a web server and retain the details of all activity, from bots and people, that occurs on a website. This information helps SEOs track technical issues and optimize a crawl budget.
- Status code and file type analysis
- Desktop and mobile bot activity
- Identify the most crawled pages on a website
- Discover opportunities to manage bot activity and optimize crawl budget
- Eliminate structural and navigational problems that affect the accessibility of certain pages
Why should you use this tool?
Analyzing a log file manually is tiresome. Unless you’re highly trained in technical website analysis, it can be an arduous task that leaves you cross-eyed and confused. If you want the quickest way to read an access log and understand how bots from Google interact with your website, this is the tool for you. If you work with clients, you can use it to assess a new client’s website and establish a technical roadmap for ways to restructure an improve the website’s crawlability.
How does it work?
First, make sure that your log file is unarchived and in the proper access.log file format. Then you simply drag and drop your file into the form on the Log File Analyzer page to upload your file to the tool. Please note that the maximum file size for upload is 1GB. The proper file format is “Combined Log Format" and it uses the following structure:
%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"
Where:
- h — the host / IP address from which the request was made to the server
- l — client id, usually stays blank (represented by a hyphen (-) in the file)
- u — username, usually stays blank (represented by a hyphen (-) in the file)
- t — the time and time zone of the request to server
- r — the type of the request, its content and version
- s — the HTTP status code
- b — the size of the object requested (in bytes)
- Referer — the URL source of the request (previous page), often stays blank (represented by a hyphen (-) in the file)
- User-Agent — the HTTP header containing information about the request (client application, language, etc.)
Sample string:
66.249.64.222 - - [29/Jun/2018:13:43:07 +0100] "GET /samplepage.html HTTP/1.1" 200 2887 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.domain.com/bot.html)"
Additional Supported Log File Formats
Log File Analyzer also supports the following log file formats:
- W3C
- Kinsta
- Combined log format variations
Start the Log File Analyzer
Once all of your files are uploaded, click “Start Log File Analyzer”. In the charts you will see how Googlebot behavior changed over time. You can filter to show only desktop or mobile bot activity with the “All Google Bots” filter above the table, and adjust the time period. The charts on the right tell you how many of each status code and file type the bots interacted with.
In the table below, you can analyze all of the paths that received the most hits by a bot in the time frame. To look deeper into this report you can filter by status code, keyword in the path, or file type.
WIth this information you look for consistency of response statuses to investigate any availability issues. You can also investigate bot hits by content type. This helps you understand if the Crawl Budget’s spending has changed over time.
File type filters include:
- HTML
- PHP
- amp
- CSS
- Javascript
- JSON
- etc.
Deleting previously downloaded logs
If you are dealing with more than one website you may need to analyze log files from different sources. If this is the case, you’ll have to erase all previously uploaded data. To do so use the ‘Delete Data’ button at the upper right corner of the screen.
After you confirm data deletion the system will take you back to the initial screen so that you could upload the new log file.