We have some stats being calculated and logged in our logs and we wanted to plot graphs via our already running Graphite service. There are two ways to approach this issue. You can directly send statistics via graphite library for your language of choice. For example: it is super simple to set it up and send metrics in Python via graphite send. But, it requires adding extra code to your business application to send metrics to Graphite. My personal strong opinion is your application should be responsible for doing business logic and not neccessarily sending stats to Graphite.

Enter logstash! You can send statistics in your logs via logstash to graphite.

Here's the config broken down to smaller chunks:

Read the log file with multiline codec:
input {
  stdin { }
  file {
    path => "/var/log/your_app/your_app.log"
    start_position => beginning
    # unless the line starts with a digit club it into previous line
    codec =>  multiline {
      'negate' => true
      'pattern' => '^\d'
      'what' => 'previous'
    }
  }
}

Here we are setting up logstash to read from a file: /var/log/your_app/your_app.log. Lines 7-11 setup the input config with multiline codec that basically appends current line to previous line unless the current line starts with a digit. The effect of this codec is to have stacktraces all condensed into one line. Another thing of note is we are starting from beginnning of the log file: start_position => beginning

Parse each log line to look for certain patterns:
filter {
  grok {
    match => [ "message", "Number of requests:%{SPACE}%{NUMBER:TotalReq},%{SPACE}Successes:%{SPACE}%{NUMBER:TotalSuccess},%{SPACE}Errors:%{SPACE}%{NUMBER:TotalErrors}.%{GREEDYDATA:LogMessage}" ]
    add_tag => [ "API_Stats", "Regular_Logs" ]
  }
  grok {
    match => [ "message", "API errors:%{SPACE}
    Total Errors: %{NUMBER:API_TotalErrors}
    \[400: %{NUMBER:API_400_Errors}, 
    401: %{NUMBER:API_401_Errors}, 
    404: %{NUMBER:API_404_Errors}, 
    4xx: %{NUMBER:API_4xx_Errors}, 
    500: %{NUMBER:API_500_Errors}, 
    5xx: %{NUMBER:API_5xx_Errors}, 
    others: %{NUMBER:API_Others_Errors}\]" ]
    add_tag => [ "API_Error_Stats", "Regular_Logs" ]
 }
}

In the filter block, we match each line against the grok patterns in line 16 and 22 - 30. Let's look at line no: 16 first. If a log line matches the pattern, then it is tokenized to capture TotalReq, TotalSuccess, TotalErrors, LogMessage and adding tags API_Stats, Regular_Logs. Similarly, if a line matches grok pattern on line number 20, then it is tokenized into:

  • API_TotalReq
  • API_TotalSuccess
  • API_TotalErrors
  • API_400_Errors
  • API_401_Errors
  • API_404_Errors
  • API_4xx_Errors
  • API_500_Errors
  • API_5xx_Errors
  • API_Others_Errors

What's most important is to remember that the actual values are stored in the tokens mentioned above; which means we can send these tokens to graphite to plot values. That's what follows next.

Send the tokens to graphite:
output {
  if "API_Stats" in [tags] {
    stdout { codec => rubydebug }
    graphite {
      host => "10.11.12.13"
      port => 2003
      metrics => [
        "environments/staging/servers/stga-API/overall/total_requests", "%{TotalReq}",
        "environments/staging/servers/stga-API/overall/total_success", "%{TotalSuccess}",
        "environments/staging/servers/stga-API/overall/total_errors", "%{TotalErrors}"
      ]
    }
  } else if "API_Error_Stats" in [tags] {
    stdout { codec => rubydebug }
    graphite {
      host => "10.11.12.13"
      port => 2003
      metrics => [
        "environments/staging/servers/stga-API/API/total_errors", "%{API_TotalErrors}",    
        "environments/staging/servers/stga-API/API/total_400_errors", "%{API_400_Errors}",
        "environments/staging/servers/stga-API/API/total_401_errors", "%{API_401_Errors}", 
        "environments/staging/servers/stga-API/API/total_404_errors", "%{API_404_Errors}",
        "environments/staging/servers/stga-API/API/total_4xx_errors", "%{API_4xx_Errors}", 
        "environments/staging/servers/stga-API/API/total_500_errors", "%{API_500_Errors}",
        "environments/staging/servers/stga-API/API/total_5xx_errors", "%{API_5xx_Errors}", 
        "environments/staging/servers/stga-API/API/total_others_errors", "%{API_Others_Errors}"
      ]
    }
  } 
}

With the graphite output it's pretty straightforward to send logs to graphite. All you need to provide is the host, port and metrics tags. The metrics is basically a list of elements with the first element pointing to the path in graphite where the stats will be visible followed by the values. For example: "%{API_TotalErrors}" is the value of total errors in the API and will be sent to: environments/staging/servers/stga-API/API/total_errors.

For sake of completeness here's the full configuration with important parts highlighted.
<code data-gist-id="48d9a423efa7e098d05f"data-gist-highlight-line="4-5,16,20,32-34,43-46">

That's it! And your stats are seemlessly available in graphite. Have you tried graphite with logstash? What's your experience? Did you have a different configuration?