SPS Validator Development Update and Invoice #1

in #spsdao4 months ago (edited)

Validators Development

JPTR Corporation

This document offers a clear and thoughtful summary of the tickets we’ve successfully closed over the past 30 days, along with an overview of the work currently in progress. It’s designed to give you a comprehensive snapshot of our significant achievements, all presented in an easy-to-read format, with insights summarized by ChatGPT from Jira issues.

We hope this overview highlights the hard work and dedication that has gone into each task, ensuring that our development efforts are both transparent and impactful.

A payment of $40,000.00 is required to be made to JPTR, Corporation. Please ensure the payment is made in the form of USDC to the Ethereum address:
0x57d917726073D7582022897F753B034aA593220c.

Note: Before submitting the full amount, please send a minimal test transaction to verify the accuracy of the transfer. Once confirmed, proceed with the full payment.

TLDR

  • Migrated the unclaimed balances system to the validator and added new API endpoints for user access. Refactored the validator code, removing over 1,000 lines of duplication.
  • Removed outdated features like voucher staking rewards, license voucher reward pools, and discounts, simplifying the system.
  • Updated and validated tests after these changes to ensure system stability.
  • Enhanced CI/CD pipelines for better deployment in QA environments.
  • Upgraded hive nodes to the latest version, fixed critical issues, and improved deployment scripts for better node management.
  • Overhauled the hived-proxy with better caching and compression, and resolved block corruption issues.
  • Launched and configured the AWS account for the sl-validator environment, including schema migrations and service updates.

Migration & Refactoring: Enhancing System Efficiency and Developer Experience

  1. Full Migration of the Unclaimed Balances System: Transitioned the remaining components of the unclaimed balances system from the legacy SM environment to the validator. This migration was critical to align the unclaimed balances system with the new infrastructure, ensuring full compatibility with ongoing updates and improvements.

  2. Support for License Reward Pools: Integrated comprehensive support for license reward pools within the validator, facilitating smoother reward distribution and management.

  3. Introduction of New API Endpoints for Enhanced User Interaction: Developed and deployed new API endpoints designed to allow users to easily retrieve their unclaimed balances. These endpoints provide greater transparency and control over individual reward accrual, enhancing user satisfaction and trust.

  4. Extensive Validator Code Refactoring for Open Source Development: Conducted a thorough cleanup of the validator operation handling codebase. This included removing over 1,000 lines of redundant and duplicated code, significantly reducing code complexity. This refactoring effort was aimed at simplifying ongoing development, reducing the potential for errors, and making future updates more manageable.

  5. Voucher Staking Rewards Decommissioning: Fully removed the voucher staking rewards from the system. This decision was part of a broader strategy to simplify the reward mechanisms and focus on more streamlined and effective incentives for users.

  6. Phasing Out of Voucher Reward Pools: Methodically removed all components associated with the voucher reward pools that were tied to license purchases. This action was necessary to realign the reward structure with current business objectives and user expectations.

  7. Elimination of Voucher Discounts in the Validator Shop: Discontinued the voucher discount feature for license purchases within the validator shop. This step was taken to simplify the purchasing process and eliminate any potential confusion among users regarding discount eligibility.

  8. System-Wide Simplification Through Feature Removal: By removing outdated or redundant features, the system was streamlined, resulting in a more efficient and user-friendly experience. These removals were carefully planned to ensure minimal disruption while maximizing system performance and clarity.

Testing & Validation: Ensuring Robustness After Major Changes

  1. Comprehensive Test Updates Following System Migration: Following the migration of the unclaimed balances system to the validator, all associated tests were meticulously updated. This ensured that the new system's functionality was fully validated and that no unintended issues were introduced during the migration.

  2. Validation Post-Feature Decommissioning: After the removal of voucher staking rewards and reward pools, a thorough review and update of all relevant tests were conducted. This step was crucial to maintaining the integrity and reliability of the system after these significant changes.

  3. Extensive Test Revisions for System Consistency: Ensured that all tests were aligned with the current state of the system, reflecting the removal of several features and the introduction of new functionalities. This extensive testing phase was critical to maintaining high standards of reliability and performance.

Continuous Integration & Deployment: Optimizing Deployment Processes

  1. CI/CD Pipeline Enhancement for QA Environment Support: Upgraded the continuous integration and deployment (CI/CD) pipeline to better support deployments within the Splinterlands QA environments. This enhancement facilitates more efficient testing and validation processes, enabling faster iteration cycles and more robust releases.

  2. Optimized Deployment Workflow for Consistent Updates: Improved the overall deployment workflow to ensure that updates are rolled out consistently and reliably across all environments. This optimization reduces the risk of deployment errors and ensures that all environments remain in sync with the latest codebase.

Hive Node Management: Upgrading and Stabilizing Critical Infrastructure

  1. Hive Node Issue Resolution for Improved Stability: Addressed and resolved critical issues with multiple hive nodes (including hived.splinterlands.com, hived-2.splinterlands.com, and the internal Splinterlands node). These issues were primarily related to get_account_history calls returning invalid responses, which required a deep investigation and a full replay of the nodes due to changes in the hived state format introduced in version 1.27.4.

  2. Full Replay Implementation to Support Latest Node Version: Performed a full replay of the hive nodes to accommodate the changes in the state format, ensuring that all nodes are fully updated and operating correctly on the latest version, 1.27.6.

  3. Node Upgrade and Migration to Ubuntu 22.04: Upgraded all hive nodes to the latest software version (1.27.6) and migrated the underlying operating system to Ubuntu 22.04. This upgrade was necessary to ensure continued compatibility with the latest features and security updates, as well as to improve overall node performance.

  4. Docker Health Check Integration for Node Stability: Implemented a Docker health check mechanism to automatically monitor and restart nodes if they become unresponsive or encounter errors. This proactive approach helps to maintain node stability and minimize downtime.

  5. Monitoring Node Sync Progress Post-Upgrade: Actively monitored the sync process of the upgraded nodes to ensure successful and complete deployment. This ongoing oversight is critical to confirming that all nodes are fully operational and synchronized.

  6. Deployment Script Rewriting for Simplified Node Management: Rewrote the existing Docker deployment scripts to simplify the process of deploying and replaying hive nodes. These improvements make it easier to manage node upgrades and ensure that all nodes remain in a consistent and reliable state.

Proxy & Caching: Enhancing Data Integrity and Performance

  1. Complete hived-proxy Overhaul: Rewrote the hived-proxy to include enhanced caching mechanisms for the latest live blocks and introduced a gzip compression pipeline. This overhaul was necessary to optimize performance and reduce latency in serving live data.

  2. Resolution of Cloudflare TransformStream Issue: Identified and resolved an issue with Cloudflare’s TransformStream update, which caused corruption in blocks beyond the 78 million mark. This required pinpointing the exact block where corruption began and removing all affected blocks to restore data integrity.

  3. Reimplementation of CompressionStream for Data Storage: Reimplemented the CompressionStream to store up to 1,000 blocks in R2 storage using gzip. This change was crucial for optimizing storage space while ensuring that stored data remains easily accessible and uncorrupted.

  4. Block Integrity Verification for R2 Storage: Verified that all blocks stored in R2 storage are consistent with the new data structure and can be successfully decrypted. This verification process was essential to ensuring that the system's data integrity was maintained after the updates.

  5. Cronjob Optimization for Latest Block Retrieval: Rewrote the cronjob responsible for retrieving and caching the latest 1,000 blocks. This optimization ensures that the system always has the most recent data available, improving the performance and responsiveness of live data services.

sl-validator Environment Deployment: Establishing a Robust New Environment

  1. Terraform Configuration Update for OIDC Integration: Updated Terraform configurations to support OpenID Connect (OIDC) from GitHub for deployments. This update was necessary to ensure secure and streamlined authentication processes during deployments.

  2. AWS Account Launch and Setup for sl-validator: Launched a new AWS account specifically for the sl-validator environment. This included applying a comprehensive bootstrap environment to ensure that all necessary configurations and resources were correctly established from the outset.

  3. Database Migrations and User Creation for sl-validator: Successfully applied database migrations to the sl-validator environment, ensuring that all necessary schema changes were in place. Additionally, created the required user accounts to support ongoing operations within the sl-validator environment.

  4. Service Configuration for Validator Operations: Updated existing services to ensure they are compatible with and fully support validator operations. This included managing configuration files and secrets to align with the unique requirements of the sl-validator environment.

  5. Schema Separation for Validator Environment: Adjusted the validator environment to operate within a completely separate schema, preventing potential conflicts with other environments and ensuring that validator operations remain isolated and secure.

  6. Schema Migrations for Validator Integration: Applied the necessary schema migrations to fully integrate the validator within the new environment. This step was critical to ensuring that all aspects of the validator's functionality were properly supported and aligned with the new infrastructure.

Sort:  

Have you really copy/pasted images into a Hive post? Spending 2-3 minutes to copy/paste the actual text was too much work?

This is honestly a bad way to keep communication open with the community.

Downvoting because it's not a pleasure to read a post in this format and I think Hive rewards should be used for people putting more effort into their content.

Thanks for the update @jptrcorp. Much better now 👍

PS: Downvote removed

for the layman like me, is it possible to give a % of work completed (or remained) towards near the end of the year completion and deployment?

Thanks for the comprehensive report and all the hard work that went into it! I’ve got a few questions to clarify some technical details:

Hive - Node Questions:

  1. Could you please provide more details on how you're running the Hive nodes? Specifically, what Docker image are you using? Are these full nodes? If possible, could you share the docker-compose file or some documentation to help us better understand the setup?
  2. Are there any special configurations needed to run the validator network on Hive nodes, or can we use any public Hive node for this purpose?

Proxy & Caching:
From what I understand, SPL runs its own Hive nodes that are cached through Cloudflare, which is then used by the game. Will this setup continue to function the same way when the validator network goes live? Additionally, will the costs associated with caching and implementing Cloudflare and R2 Storage continue to be covered by SPL? If so, is this a significant cost factor we should consider?

sl-validator Network:
Could you also clarify the purpose of the AWS account? Will the SPS-chain be able to run independently from AWS and Terraform, or are these dependencies essential?

Looking forward to your insights!

  1. We can share the Dockerfile with you later this week, but to give you a quick summary: our setup is designed to simplify the management of Hive nodes, making it easier to run, reset, delete, or archive nodes. The downside is that it requires a significant amount of storage—about 4TB of NVMe with ZFS.

  2. No special configurations are required. The validator network uses standard Hive API calls, so any public Hive node will work just fine.

  3. Yes, the setup will continue to function the same way when the validator network goes live. As for costs, they are currently covered by SPL, with expenses around $3,000 per month due to the contract. However, the actual runtime costs are lower and probably closer to $1,000 per month. Splinterlands will try to keep it running as a service to the community and to our organization.

  4. The AWS account is currently being used to verify that the validator integrates seamlessly with the existing SteemMonsters API. At this stage, it's primarily for testing. Eventually, the validator will transition to the production account, and the current setup will be phased out. The SPS chain is designed to operate independently of any specific instance of the validator network, including Splinterlands, which will function within a decentralized environment. However, we plan to implement a fallback system where transactions are stored independently in a centralized database for tracking, roll-ups, and our own analytics.

Thanks, PJ, for your response. It really helped to clarify some of my questions.

We can share the Dockerfile with you later this week.

Yes, that would be incredibly helpful. It would allow me to better understand how everything works and potentially set up a node myself that aligns with the Splinterlands infrastructure.

I’m looking forward to discussing more details like this. I believe it’s crucial to stay engaged with the development within the community so we’re fully prepared when it goes live. I'm still a bit unclear on how or which API calls will be handled by the validator software, but I’m confident we’ll figure it out as we go.

Thanks for sharing! - @azircon

Congratulations @jptrcorp! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)

You received more than 10 upvotes.
Your next target is to reach 50 upvotes.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Check out our last posts:

LEO Power Up Day - August 15, 2024