Patroneos: Code Analysis, Benchmark results, Operation and Implementation Recommendations

in #en7 years ago (edited)

27017302889_9edc086467_z.jpg

Good morning. It is EOSeoul.

This article is about sharing our understanding of the Patroneos and our recommendations on 1) how Patroneos works, 2) Benchmark results, and 3) Operation and Implementation recommendations. We assume a reader has a sufficient knowledge and understanding of simple / advanced Patroneos settings.

All descriptions are based on the commits below;

  • Update on June 6 2018 14:30 KST(UTC+9)
    • Analysis Standard Code Revision: 6501e4429f43f4de78444a227149773b914e221b
    • Analysis Criteria Code Committed: Thu May 31 17:44:16 2018 -0400
    • Changes
      • Patroneos Issue # 26 - FIXED
      • validateMaxTransactions added
      • updates on recommendation
      • If there were important issues on patroneos for block producers till June 13, we will comment on this post.
  • Original post
    • Analysis Standard Code Revision: 48422fa05b47373ad68013f4d77d290e7fc31aae
    • Analysis Criteria Code Committed: Thu May 31 17:44:16 2018 -0400

Introduction

Block.one released one of its own software, Patroneos, with the release of EOSIO Dawn 4.2. The name is derived from the Harry Potter novel. Patronus is the name of the spell to defeat a creature called Dementor. Patroneos borrowed the name from this spell.

Many people around the EOS have expressed concerns over a variety attempts to attacks on the EOS mainnet. Communities and Block.one have improved EOSIO software by projecting different attack scenarios. Patroneos is one of the final products of these efforts. It filters out attacks by the basic form of Denial of Service and passes only normal transactions to the EOS RPC API Endpoint.

How it works

Code Configuration

The code consists of three main files. main.go,filter.go, and fail2ban-relay.go.

main.go

main.go implements the following functions.

  • Configuration File
    • Directive Config definition
    • Handlers that can update configuration files dynamically
    • Parsing configuration files
  • main function
    • Parsing command line arguments
    • Call handler according to execution mode

The program is implemented to operate in Filter mode (filter) or Relay mode (fail2ban-relay). Let's take a look at each.

filter.go

filter.go is a file that implements Filter mode.

  • Operation
    • The HTTP request received by Patroneos is processed in 5 successive checks.
    • When it is verified to be a valid request, it throws a request to an API Endpoint that provides the EOS RPC API and sends the result to the client.
    • However when it is verified to be an invalid request, it immediately sends the HTTP 400 Bad Request and terminates the connection.
    • If a valid request is sent to the API Endpoint and the HTTP result code 200 is not received from the API Endpoint, a TRANSACTION_FAILED error is generated and the connection is terminated.
    • It sends the /patroneos/fail2ban-relay message as a HTTP request to Patroneus, running in Filter mode, leaves a log and terminates the connection.
  • 5 validation logics
    • validateJSON(): It verifies the validity of the JSON received as the body of the HTTP POST. If it fails, INVALID_JSON error occurs and it is filtered.
    • validateMaxTransactions() : It verifies whether the number of transactions in a JSON array is less than the maximum value. When it is more than the maximun number, TOO_MANY_TRANSACTIONS error occurs and is filtered.
    • validateTransactionSize(): It verifies whether the number of signatures in the transaction is less than the maximum value. When it is more than the maximum number, INVALID_NUMBER_SIGNATURES error occurs and is filtered.
    • validateMaxSignatures(): It verifies whether the transaction is a blacklisted contract action. If it fails, BLACKLISTED_CONTRACT error is generated and filtered.
    • validateContract(): It verifies whether the size of the transaction is less than the maximum value. If it fails, INVALID_TRANSACTION_SIZE error occurs and it is filtered.
    • Out of the logics mentioned above, validateTransactionSize(), validateMaxSignatures() and validateContract() assume that the JSON of the HTTP Request Body is an Object. However, push_transactions of the HTTP Chain API uses a JSON Array and it is treated as PARSING_ERROR. We reported this on Patroneos Issue # 26. If the issue is resolved before this post becomes unchangeable, this port will be updated.
  • Compatibility of logging and Relay mode
    • For all valid HTTP requests
      • If you connected to the Patroneos running in Relay mode, the processed log is sent to HTTP server.
      • If there is no Patroneos operating in Relay mode, it is logged to the log file
    • For any invalid HTTP requests
      • If you connected to the Patroneos running in Relay mode, the processing log is sent to HTTP server
      • If there is no Patroneos operating in Relay mode, it is logged to the log file
      • Regardless the Patroneos setting in Relay mode, HTTP 400 Bad Request will be sent to to the HTTP client,

fail2ban-relay.go

fail2ban-relay.go is a file implemented in the Relay mode.

  • It is very simple. It is to log files received from / patroneos / fail2ban-relay in order for fail2ban to scan.

Review

Goroutine

Patroneos is written in Go, the programming language Google created in 2009. Go provides the Goroutine as an asynchronous mechanism. The routine is lightweight threads managed by the Go runtime. When you call a function with the keyword "go", the runtime executes the function concurrently in a time-division manner in the same memory address space.

Go program can be processed in parallel with a plurality of CPUs or cores. With the runtime.GOMAXPROCS() function, you can determine the number of logical cores it can use. Go has been changed to use all of the logical cores on machines since version 1.5. Therefore, the call functions are then processed in parallel on a multicore machine.

As of 1st June, 2018, it will typically install ‘Go 1.10.2’. The Ubuntu 18.04 LTS and macOS High Sierra 10.13.4 will install golang through apt and brew, respectively, and 1.10.2 will be installed. CentOS 7.5 will install the version 1.9.4.

Consequently using Goroutine with the recently released version of Go, the process can run automatically in parallel using the multicore machine.

HTTP Request processing step: HTTP ServMux &ListenAndServe

Under the Serve() function in http, after the connection is accepted, and the new connection is handled by goroutine in serve. It can be seen that the part receiving the HTTP request is processed in parallel using multicore.

ListenAndServe uses the default http.Server without a timeout. At the time of analysis, Patroneos uses Server without timeout setting. In case of when there is no appropriate timeout at the point where the HTTP requests are sent to the client or the Patroneos, the latter will wait indefinitely when no data is received after making an HTTP connection.

Therefore, we recommend the implementation of an appropriate architecture so that 1) Patroneos do not receive the client's request directly, and 2) Patroneos receive an HTTP request with a timeout.

Request Validation step

The 5 validations are not called via "go" keyword(goroutine) and are processed in series. In fact, the validation is not processed in parallel at all. The validation logic so far is rather simple that it does not need to be processed in parallel.

As ServeMux.HandleFunc is executed after receiving all the HTTP body from the client, the validation logic does not have a timeout issue.

However, when using a relay, timeout may occur, but it seems that there is no big problem as described below.

Validation Result Relay and API Endpoint forwarding step: HTTP Client

In many documents, Go's HTTP Client has been confirmed it is safe to use concurrency with the goroutine.

HTTP Client has no timeout unless there is specific timeout settings. At the time of this analysis, Patroneos uses default HTTP Client to send Filter result to Relay Patroneos, and to replay Request to API Endpoint. In both cases, Timeout is not set.

If there is not an appropriate timeout in the API Endpoint that passes the verified request among the HTTP requests received by Patroneos, it will wait indefinitely until it receives a response.

Therefore, it is recommended that an API endpoint should set the appropriate timeout.

Let Patroneos in Relay mode as RP and Patronos in Filter mode as FP. If FP does not use RP, there are no issues. Let's assume that you let the FP to use the RP. If the RP is off or responds normally, there is no issue. As the logic of the RP is so simple, normal cases excluding the insufficient file descriptor or the delayed processing, it is very rare that the response of the RP is delayed. Moreover, when the ports used by the RP with the indefinite response TCP / HTTP server, that would be a big problem, but this case also is very rare. Therefore, when FP uses RP, the issue related to timeout is expected to be very rare.

Benchmark

The focus was on identifying the processing capacity of Patroneos itself. So we configured simple HTTP request and API endpoint. This benchmark can be understood as a laboratory benchmark.

Tests use two JSON of different size, two HTTP request concurrency in 100 or 1000. For understand what will happen when there is processing latency in API processing, tests use two latency setting, 0 ms or 100 ms.

Test Configuration

Below is a test configuration for the benchmark. In the production environment, settings must be changed on the different situation accordingly.

  • System
    • OS: Ubuntu 18.04 LTS
    • CPU: Intel i7-6700 CPU @ 3.40GHz / 8 logical cores
    • Memory: 32GB
    • parameter
      • max open files: 500,000
      • net.ipv4.tcp_tw_reuse = 1
      • net.ipv4.ip_local_port_range = "10000 65000"
  • HTTP request generation
  • Patroeos Configuration and setting
    • filter patroneos
{
   "listenPort": "8081",

   "nodeosProtocol": "http",
   "nodeosUrl": "127.0.0.1",
   "nodeosPort": "8000",

   "contractBlackList": {
       "currency": true
   },
   "maxSignatures": 10,
   "maxTransactionSize": 1000000,

   "logEndpoints": ["http://127.0.0.1:8080"],
   "filterEndpoints": [],

   "logFileLocation": "./fail2ban.log"
}
  • relay patroneos
{
   "listenPort": "8080",

   "nodeosProtocol": "http",
   "nodeosUrl": "127.0.0.1",
   "nodeosPort": "8000",

   "contractBlackList": {
       "currency": true
   },
   "maxSignatures": 10,
   "maxTransactionSize": 1000000,

   "logEndpoints": [],
   "filterEndpoints": ["http://127.0.0.1:8081"],

   "logFileLocation": "./fail2ban.log"
}
  • Dummy API Endpoint using Go : use 1 core only with runtime.GOMAXPROCS(1)
package main

import (
   "fmt"
   "log"
   "net/http"
   "time"
   "runtime"
   "io/ioutil"
)

func main() {
   runtime.GOMAXPROCS(1)
   http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
       if len(r.FormValue("case-two")) > 0 {
           fmt.Println("case two")
       } else {
           time.Sleep(time.Millisecond * 100)
           b, err := ioutil.ReadAll(r.Body)
           if err != nil {
               log.Fatal(err)
           }
           fmt.Println(b)
           //fmt.Println("case one end")
       }
   })

   if err := http.ListenAndServe(":8000", nil); err != nil {
       log.Fatal(err)
   }
}

Test methods and results

  • JSON-A type for test request
{"account": "initb", "permission": "init", "authorization" active "}]," data ":" 000000000041934b000000008041934be803000000000000 "}
  • JSON-B type for test request
{ "Id": "37df4598d37bb8fdbc440e31caae07906ac90fd3fd2cd060f2ca13e59e78781e", "signatures": [ "SIG_K1_K8ojKDxMnWy5Q3zAVQPwJANbEE2h9kStmPX4BorEGGQKCJXUYK62UiEYxGyQbaynraMX5WvzEFYaQqAf5Mdwu2yBf36HG7"], "compression": "none", "packed_context_free_data": "", "context_free_data": [], "packed_trx": "5f3b125b17c726f418ba000000000100a6823403ea3055000000572d3ccdcd010000000000ea305500000000a8ed32322e0000000000ea305590d5cc5865570da420a107000000000004454f53000000000d4a756e676c652046617563657400", " 0, "max_cpu_usage_ms": 0, "delay_sec": 0, "ref_block_num": 50967, "ref_block_prefix": 3122197542, "max_net_usage_words" "eosio.token", "name": "transfer", "authorization": [{"actor": "eosio", "permission": " "memo": "Jungle Faucet"}, "hex_data": "active"}, "data": {"from": "eosio" 0000000000ea305590d5cc5865570da420a1070000000004454f53000000000d4a756e676c6520466175636574 "}]," transaction_extensions ": [] }}
  • Test # 1

    • HTTP request
      • request JSON-A type
      • 100 simultaneous high routine requests, 100,000 total requests
    • Dummy
      • Response Latency 0ms
    • Result
      • Patroneos processing result: Total time required 15.07 seconds, average time per issue 0.0149 seconds, 6635.07 TPS
      • Memory usage: filter mode 17.1MB, relay mode 11.8MB
      • CPU usage: Max 15% per thread
  • Test # 2

    • HTTP request
      • request JSON-A type
      • 100 simultaneous high routine requests, 100,000 total requests
    • Dummy
      • Response Latency 100ms
    • Result
      • Patroneos processing results: Total travel time 103.5748 seconds, average travel time per issue 0.1033 seconds, 965.48 TPS
      • Memory usage: filter mode 17.1MB, relay mode 11.8MB
      • CPU usage: Max 9.7% per thread
  • Test # 3

    • HTTP request
      • request JSON-A type
      • 1000 simultaneous high routine requests, 500,000 total requests
    • Dummy
      • Response Latency 0ms
    • Result
      • Patroneos processing results: total time 69.79 seconds, average time per issue 0.1060 seconds, 7163.58 TPS
      • Memory usage: filter mode 391.7MB, relay mode 23.7MB
      • CPU usage: Max 26.9% per thread
  • Test # 4

    • HTTP request
      • request JSON-A type
      • 1000 simultaneous high routine requests, 500,000 total requests
    • Dummy
      • Response Latency 100ms
    • Result
      • Patroneos processing results: Total time taken 76.2931 seconds, average time per issue 0.1486 seconds, 6553.67 TPS
      • Memory usage: filter mode 143.4MB, relay mode 20.0MB
      • CPU usage: Max 25.0 per thread
  • Test # 5

    • HTTP request
      • request JSON-B type
      • 100 simultaneous high routine requests, 100,000 total requests
    • Dummy
      • Response Latency 0ms
    • Result
      • Patroneos processing results: total travel time 29.53 seconds, average travel time per issue 0.0293 seconds, 3385.76 TPS
      • Memory usage: filter mode 17.MB, relay mode 11.7MB
      • CPU usage: Max 7.6% per thread
  • Test # 6

    • HTTP request
      • request JSON-B type
      • 100 simultaneous high routine requests, 100,000 total requests
    • Dummy
      • Response Latency 100ms
    • Result
      • Patroneos processing result: Total time required 109.91 seconds, average time per issue 0.1093 seconds, 909.80 TPS
      • Memory usage: filter mode 15.8MB, relay mode 11.0MB
      • CPU usage: Max 5.3% per thread
  • Test # 7

    • HTTP request
      • request JSON-B type
      • 1000 simultaneous high routine requests, 500,000 total requests
    • Dummy
      • Response Latency 0ms
    • Result
      • Patroneos processing result: total time required 148.5215 seconds, average time per issue 0.2930 seconds, 3366.51 TPS
      • Memory usage: filter mode 110.5MB, relay mode 14.3MB
      • CPU usage: Max 9.2% per thread
  • Test # 8

    • HTTP request
      • request JSON-B type
      • 1000 simultaneous high routine requests, 500,000 total requests
    • Dummy
      • Response Latency 100ms
    • Result
      • Patroneos processing result: total travel time 153.37 seconds, average travel time per issue 0.3046 seconds, 3259.90 TPS
      • Memory usage: filter mode 103.0MB, relay mode 13.5MB
      • CPU usage: Max 8.6% per thread

Result summary

  • Memory usage
    • Filter mode: < 400MB
    • Relay mode: < 24MB
  • CPU usage: < 25% per thread
  • File Descriptor: no issues with environment above
  • DNS resolving: use IP only in configurations, so minimized DNS resolving

Recommendations and Conclusions

  • Patroneos operating recommendations (for block producers)
    • In preparation of malfunctioning Patroneos, the implementation of immediate bypass layer architecture when necessary and train sufficiently on the bypass on / off.
    • Monitoring of CPU utilization, file descriptor error, etc. to determine scale-out criteria for the Patroneos layer
    • Implementation of a proper architecture to let Patroneos operating in Filter mode, not to receive client requests directly and to receive HTTP requests with timeout.
    • API Endpoint must set appropriate timeout.
    • Be careful not to make the wrong service on the TCP port of Patroneos operating in Relay mode.
    • Use IP instead of FQDN in the nodeosUrl setting, if possible, to minimize DNS resolving overhead
    • Implement an architecture that allows enough file descriptors per Patroneos process and tune necessary system parameters
    • When fail2ban-relay is used, it is advantageous to keep binary name in operation and monitoring. ex) patroneos-filter, patroneos-relay
    • Set TCP port reuse and sufficient port range
    • SSD is recommended to use to rotate and log fail2ban.log file
    • At the time of this analysis, a problem was issued in handling the push_transactions of JSON arrays and reported this bug on Patroneos Issue # 26. You need to check the response of this issue. Before resolving, you should implement URI route bypass or reroute for HTTP requests which use push_transactions.
    • [Patroneos Issue #26] is resolved. It is safe with push_transactions.
    • If it needs to open a public API Endpoint, use access-control-allow-origin with * in nodeos config.
      • If doubt, see section same-origin policy & CORS of Reference below.
  • Patroneos Implementation Recommendations (to Patroneos Committer & Block.one)

    • It is recommended to have an architecture to reflect the timeout properly in HTTP Server andClientconfiguration.
      • HTTP Server:ReadTimeout, ReadHeaderTimeout,WriteTimeout
      • Timeout of HTTP Client
      • Use HTTP Transport for HTTP Client
      • Tuning the parameters of Transport: MaxIdleConnsPerHost, MaxIdleConns, IdleConnTimeout, ResponseHeaderTimeout, net.Dialer.Timeout
      • For more information, see the following section of the Reference: Go net/http implementation recommendations
    • Resolve Patroneos Issue # 26 -- FIXED
  • Conclusion

    • If you follow the recommendations above at the time of writing, there is not going to have significant functional issues using it in the live production environment.
    • Note: Before Patroneos Issue # 26 is resolved, it is necessary either to change the URL route or let it bypass HTTP push_transactions in the architecture.
    • Add the timeout setting and make the necessary changes into Patroneos code in order to fine tune timeout of the server and the client accordingly.

Reference

Any Feedback

Suggestions and questions are always welcome. Please do not hesitate to give a feedback to EOSeoul. Join the Telegram Group below to share the latest news from EOSeoul and technical discussions about EOS.

Thank you!

EOSeoul

Telegram (English) : http://t.me/eoseoul_en
Telegram (简体中文) : http://t.me/eoseoul_cn
Telegram (日本語) : http://t.me/eoseoul_jp
Telegram (General Talk, 한국어) : https://t.me/eoseoul
Telegram (Developer Talk, 한국어) : https://t.me/eoseoul_testnet
Steemit : https://steemit.com/@eoseoul
Github : https://github.com/eoseoul
Twitter : https://twitter.com/eoseoul_kor
Facebook : https://www.facebook.com/EOSeoul.kr
Wechat account: neoply
EOSeoul Documentations : https://github.com/eoseoul/docs

Sort:  

Hi, we have voted on your post because you have posted your article to either food, recipe, recipes, cooking or steemkitchen #tag. Steemkitchen is a brand new initiative where we want to build a community/guild focused purely on the foodie followers and lovers of the steem blockchain. Steemkitchen is out of the conceptual phase and growing each day. We would love to hear your thoughts and ideas.

We are almost ready to Launch the first Decentralized Recipe and Food Blog Website that will utilize the Steem BlockChain and its community to reward contributions by its members.

Please consider joining us at our new discord server https://discord.gg/XE5fYnk

Also please consider joining our curation trail on https://steemauto.com/ to help support each other in this community of food and recipe lovers.

Kind Regards

@steemkitchen

Ps. Please reply “No Vote” if you prefer not to receive this vote and comment in the future.

Thank you for the nice write-up. We are working to address some of the issues/recommendations you mentioned.

@eoseol can you provide a repository with the code you use to run your benchmarks? That would help us to try and find max threshold under different conditions.

Loading...