No description

Find a file

dajoho ae202c2cbc initial commit. extract scraping logic to dedicated library (was previously in pandorytool)		2026-03-18 08:05:18 +01:00
include/teampandory/retroscrape	initial commit. extract scraping logic to dedicated library (was previously in pandorytool)	2026-03-18 08:05:18 +01:00
src	initial commit. extract scraping logic to dedicated library (was previously in pandorytool)	2026-03-18 08:05:18 +01:00
.gitignore	initial commit. extract scraping logic to dedicated library (was previously in pandorytool)	2026-03-18 08:05:18 +01:00
build.sh	initial commit. extract scraping logic to dedicated library (was previously in pandorytool)	2026-03-18 08:05:18 +01:00
meson.build	initial commit. extract scraping logic to dedicated library (was previously in pandorytool)	2026-03-18 08:05:18 +01:00
README.md	initial commit. extract scraping logic to dedicated library (was previously in pandorytool)	2026-03-18 08:05:18 +01:00

README.md

retroscrape

retroscrape is a C++ library and CLI for scraping retro game metadata and media into a backend-neutral data model.

License: GPL-2.0.

The first backend maps ScreenScraper responses into generic fields such as video, videoSmall, poster, and screenshot, so callers are insulated from provider XML naming.

Dependencies

Required build dependencies:

a C++20 compiler
meson
ninja
pkg-config
libcurl
tinyxml2
OpenSSL

Debian/Ubuntu:

sudo apt-get update
sudo apt-get install -y \
  build-essential meson ninja-build pkg-config \
  libcurl4-openssl-dev libtinyxml2-dev libssl-dev

Fedora:

sudo dnf install -y \
  gcc-c++ meson ninja-build pkgconf-pkg-config \
  libcurl-devel tinyxml2-devel openssl-devel

Build

Wrapper script:

./build.sh
./build.sh linux64

Direct Meson usage:

meson setup build
meson compile -C build

Library Usage

#include <iostream>
#include <teampandory/retroscrape/retroscrape.hpp>

using namespace teampandory::retroscrape;

int main() {
    Scraper scraper;
    scraper.setBackendAuthentication(
        Scraper::backendAuthenticationFromEnvironment());
    scraper.setAccountCredentials("my-user", "my-pass");

    GameRecord game = scraper.scrapeRom({
        .romPath = "sf2.zip",
        .platform = "fba",
    });

    std::cout << game.title << '\n';
    std::cout << Scraper::cacheStatusName(game.cacheStatus) << '\n';

    for (const auto &asset : game.media) {
        if (asset.type == MediaType::video) {
            scraper.downloadMediaAsset(
                asset,
                "downloads/video.mp4",
                [](const DownloadProgress &progress) {
                    std::cout << progress.transferred << "/" << progress.total
                              << '\n';
                });
        }
    }
}

CLI Usage

retroscrape login \
  --screenscraper_user my-user \
  --screenscraper_password my-pass

retroscrape scrape roms/sf2.zip \
  --platform fba \
  --format json

retroscrape scrape roms/sf2.zip \
  --platform fba \
  --download-media video \
  --output sf2-video.mp4

The default backend endpoint credentials are compiled into the project. Callers can still override them explicitly through setBackendAuthentication().

The separate scraper service account credentials are supplied by the caller. The library exposes setAccountAuthentication(), setAccountCredentials(), accountAuthenticationFromEnvironment(), accountAuthenticationFromStorage(), and saveAccountAuthentication().

The CLI stores scraper account credentials with:

retroscrape login \
  --screenscraper_user my-user \
  --screenscraper_password my-pass

It saves them as an ini-style file under the cache root and later scrape commands load them automatically.

Platform input is backend-neutral. Callers pass strings such as fba, psp, playstation, snes, or megadrive, and retroscrape maps them internally to the backend system id.

Scrape output lists available media assets rather than raw provider URLs. Callers download them through downloadMediaAsset() in the library or through --download-media / --download-all-media in the CLI.

Scraped XML is cached by the library. Repeated identical scrapes in the same process reuse an in-memory cache, and successful XML responses are also stored on disk for reuse across runs. The returned GameRecord reports this through cacheStatus with values uncached, cachedMemory, or cachedDisk.

Downloaded media assets are also cached on disk by the library, so repeated downloads of the same asset reuse the cached file instead of hitting the backend again.

Cache locations:

Linux: ~/.local/retroscrape or $XDG_CACHE_HOME/retroscrape
macOS: ~/Library/Caches/retroscrape
Windows: %LOCALAPPDATA%\\retroscrape\\cache
fallback: the platform temp directory such as /tmp/retroscrape

Saved credentials live at credentials.ini under that same cache root.