About Geovanne Navalta

Who I Am

I’m a full-stack developer and systems builder who enjoys turning messy, real-world data into clean, dependable workflows. Since 2014, I’ve worked across a wide range of projects—from custom WordPress development and Chrome extensions to Google Apps Script automations, Python scraping pipelines, and Google Cloud deployments. I approach each engagement with the same mindset: design the smallest reliable system that solves the problem today, instrument it so we can see what’s happening tomorrow, and evolve it without breaking what already works.

How I Think About the Stack

My comfort zone spans both sides of the stack. On the frontend I work with semantic HTML5, modern JavaScript (ES6+), and utility-first CSS (Tailwind) or familiar frameworks like Bootstrap to deliver accessible UI quickly. I also build WordPress themes and plugins when a CMS is the right fit, keeping performance and maintainability in mind. On the backend I use Node.js, Python, and PHP when needed, leaning on clean API contracts (REST/GraphQL), strong input validation, and clear logging. For storage, I choose what fits: relational databases (Cloud SQL, Postgres/MySQL), document stores (Firestore), columnar warehouses (BigQuery), or key-value caches (Redis via Memorystore). I deploy to Google Cloud a lot—Cloud Run for containerized services, Cloud Functions for lightweight triggers, Scheduler/Tasks for orchestration, Pub/Sub for eventing, and Secret Manager/KMS for credentials.

E-Commerce Intelligence and Data Workflows

A recurring theme in my recent work is e-commerce intelligence: gathering product data from retailer sites, matching it to marketplace listings (like Amazon), and deciding whether a lead is worth pursuing. I’ve built pipelines that scan Shopify-based and non-Shopify retailers, parse embedded JSON and JSON-LD product data, extract UPC/EAN/GTIN, prices, stock, and variant details, and then match those records to Amazon listings. The outputs are clean CSVs or directly written rows in Google Sheets, formatted for fast review. I’m meticulous about data hygiene: preserving leading zeros on barcodes, computing and checking GTIN check digits when needed, normalizing domains and URLs, de-duplicating by SKU or slug, and standardizing column names (for example, using Barcode, price_extracted, source_url) so downstream tools don’t guess.

Decision-making ties in through Keepa and other signals. I’ve worked with BSR thresholds, ROI and margin filters, and rules that catch brand-owned listings versus third-party risk. I’ve also maintained long lists of U.S. retailers, scoring them for trustworthiness (HQ, payment options, credit-card acceptance, Amazon presence) and generating structured summaries to help teammates move quickly during sourcing sprints. Over time these systems evolved from ad-hoc scripts into repeatable, parameterized flows you can re-run with different inputs or time windows.

Web Scraping, Resilience, and Scale

Scraping at scale is less about raw speed and more about resilience. I’ve used Python (requests, httpx, BeautifulSoup, lxml), built concurrent fetchers (ThreadPoolExecutor/multiprocessing), and added backoff for status codes like 413/429, SSL quirks, or mid-request disconnects. I rotate proxies when necessary, tune concurrency to avoid cascading failures, and structure the pipeline so partial progress is always saved. On top of that, I add guardrails—timeouts per URL or collection page, progress tracking in Sheets or CSV metadata, color-coding statuses, and “resume from last good state” so a transient error doesn’t mean starting from zero. A lot of runs happen in Google Colab because it’s perfect for quick iteration, shared notebooks, and exporting results directly to Drive or Sheets.

I’ve also built Chrome extensions when scraping or augmentation needs to happen inside the browser: MV3 service workers, content scripts that read structured data from product pages, cross-tab messaging for side-by-side comparisons, and secure communication with backend APIs. Extensions are great when the site is highly dynamic or gated by anti-automation that still allows human browsing.

Google Apps Script and Spreadsheet-Driven Ops

Many of my workflows center on Google Sheets because it’s an ideal control plane for non-engineers. I write Apps Script functions that pull in URLs, call APIs (including Google Custom Search JSON API), rotate keys when a 429 appears, respect per-key cooldowns, and color cells as steps finish. I also add utilities that clean URLs, strip tracking parameters, normalize hostnames, and create IDs or barcodes with correct check digits—all while preserving leading zeros and formats that Sheets likes to “fix.” When the job is better suited to Python, I still integrate with Sheets to show progress, log errors, or push results, so the team always knows where a run stands.

APIs Everywhere: From Search to Social

I integrate with a lot of APIs: Google Custom Search for discovery, YouTube Data API for channel/video analytics and caption processing, Shopify for catalog and price checks, Stripe and payment gateways for e-commerce, Slack and Jira/Linear/ClickUp for notifications and project tracking, and numerous CRMs and knowledge tools (Notion, Airtable, HubSpot). Good API work is mostly about consistent request signing, rate-limit etiquette, retries with jitter, and clear failure surfaces—if we can predict the blast radius of an outage, we can recover in minutes instead of hours.

LLMs, Agents, and RAG

I explore language models pragmatically. I’ve built retrieval-augmented generation (RAG) workflows that chunk product catalogs (including CSV-based inventories), generate embeddings, and index them for hybrid search. The goal is to answer practical questions—matching retailer titles/descriptions to known barcodes, comparing variant attributes across sites, or summarizing differences between two listings. I’ve used OpenAI and Gemini, and I experiment with local models via tooling like Ollama when cost, privacy, or latency considerations apply. What matters most is measurable utility: does the system reduce manual triage time and improve match precision? I build evaluation sets and simple dashboards so we can answer that honestly.

On top of RAG, I prototype lightweight “agents” that chain tools: a search step, a fetch step, a match step, and a verification step with human-readable justifications. These flows are instrumented so we can see which leg failed and why (bad HTML, stale cache, API 429, or a model hallucination). I’m not trying to make magic—I’m trying to make reliable assistants that handle the boring parts and surface the handful of rows a human should actually look at.

Google Cloud as a Practical Platform

Google Cloud has been a reliable foundation for my projects. I containerize services for Cloud Run when I need fast cold starts and autoscaling, and I use Cloud Functions for simple triggers and webhooks. Workflows, Eventarc, Scheduler, and Tasks help stitch together periodic jobs and evented flows. Pub/Sub decouples producers and consumers so back-pressure doesn’t break the system. I rely on Cloud Logging and Monitoring (formerly Stackdriver) to set alerts around error rates and latency. When a pipeline becomes analytics-heavy, I move aggregates into BigQuery or, for hot reads, into Bigtable or Redis. Artifact Registry and Cloud Build keep deployment clean, while Secret Manager and KMS handle sensitive material. The net effect is less time babysitting servers and more time refining the data logic that actually creates value.

Quality, Observability, and Data Hygiene

I obsess over the details that keep systems trustworthy. Barcodes maintain their leading zeros end-to-end. Price fields are parsed and normalized into price_extracted instead of ambiguous free-text. URLs are cleaned and stored as source_url with consistent shapes. When a site uses JSON-LD, I parse it directly; when it doesn’t, I fall back to robust CSS/XPath selectors with guards for edge layouts. I delete truly empty columns, write CSVs with explicit encodings, and include timestamped logs. In Google Sheets, I color-code states (white/pending, yellow/in-progress, green/done, red/error) so anyone can scan a tab and immediately know what happened. The best automation isn’t flashy—it’s predictable.

Collaboration and Remote Work

Most of my work happens in distributed teams across time zones. I document assumptions in the code and in the Sheet, leave readable logs, and provide small reproducible test cases. I’m mindful of schedules and keep handoffs crisp: when I end a working block, the next person has links, inputs, and instructions to continue. I’ve trained VAs and junior devs to run and troubleshoot these flows, which matters because the throughput of a system is often limited by how easily others can operate it.

What I’m Exploring Next

I’m steadily improving the matching heuristics between retailer catalogs and marketplace listings—more robust fuzzy matching on titles and attributes, better use of embeddings for disambiguation, and clearer signals around brand ownership versus third-party sellers. I’m also investing in evaluation harnesses for RAG systems so we can track precision/recall over time and justify each deployment with numbers, not vibes. On the platform side, I’m simplifying deployment pipelines and expanding observability: small dashboards that show token usage, hit rates, and error classes so we can spot regressions early.

A Final Note

Whether I’m building a Chrome extension to speed up human review, a Colab notebook that scrapes and cleans thousands of products, a Sheets-driven Apps Script that wrangles APIs, or a Cloud Run service that glues it all together, my goal is the same: ship systems that reduce friction and keep scaling without drama. If that resonates with your process, we’ll work well together.

About Me