Back to Portfolio

Webpage to Article CLI

Node.js CLI tool to extract site information and article content into structured JSON with AI-based snippets.

NodejsTypescript
Webpage to Article CLI

Screenshots

About the Project

Node.js CLI tool to extract site information and article content into structured JSON with AI-based snippets. This project was designed with an approach emphasizing process measurability, ease of use, and implementation reliability, capable of delivering real impact on work efficiency and service quality.

Key Features

  • Site Metadata Extraction — Retrieves site name, description, and logo as content data foundation.
  • HTML-Formatted Article Content Extraction — Article structure is maintained so extraction results remain context-rich.
  • AI-Based Snippet Generation — Google Gemini is used to generate more informative summaries.
  • Interactive CLI — Command-line experience is made intuitive for faster operational processes.
  • Multi-Article Support — Multiple articles can be processed simultaneously in one workflow.
  • WordPress-Compatible JSON Output — Final results are directly ready to integrate into WordPress publishing workflow.

Challenges & Solutions

  • Challenge: Consistency of extraction results from various website structures becomes the main challenge. Solution: Parsing and data normalization pipeline is designed to be adaptive so JSON output remains stable and reliable.