SKILL FILE

Scrape arXiv with AI

Extract arXiv research papers, abstracts, author data, and citation info using Apify and Claude Code.

50M+ arXiv users or records

1,000 items scraped per minute

$0.10 per 1,000 papers

Download Skill File ↓

CROSS-DEPARTMENT FLOW

How scraped arXiv data flows across your company

One scrape generates intelligence for every department — automatically

Scrape arXiv arXiv research papers, abstracts, author data, and citation info

1 Configure Targets

2 Apify Actor Runs

3 Data Processed

4 Stored in CRM

→ Identify prospects from scraped data
→ Track competitor activity
→ Source outreach targets
→ Build lead lists

→ Content research and ideation
→ Competitor strategy analysis
→ Trend monitoring
→ Audience insights

→ Market sizing and analysis
→ Engagement benchmarking
→ Growth opportunity identification
→ Platform trend tracking

→ Data records stored
→ Engagement metrics indexed
→ Source attribution tagged
→ Historical data tracked

Lead List

Research Report

Trend Analysis

Market Report

arXiv data collected

Patterns identified

Benchmarks established

Replaces Semantic Scholar Pro

$30/mo $1/mo

$348/yr saved

Scrape arXiv arXiv research papers, abstracts, author data, and citation info

Configure Targets arXiv URLs, keywords, or filters defined

Apify Actor Runs Scraper extracts data — $0.10/1,000 papers

Data Processed Records cleaned, scored, and categorized

Stored in CRM Intelligence pushed to Neon database with attribution

→ Identify prospects from scraped data
→ Track competitor activity
→ Source outreach targets
→ Build lead lists

→ Content research and ideation
→ Competitor strategy analysis
→ Trend monitoring
→ Audience insights

→ Market sizing and analysis
→ Engagement benchmarking
→ Growth opportunity identification
→ Platform trend tracking

→ Data records stored
→ Engagement metrics indexed
→ Source attribution tagged
→ Historical data tracked

Content Outputs

Research Report from marketing

Lead List from sales

Trend Analysis from marketing

Market Report from growth

Everything Tracked

arXiv data collected

Patterns identified

Benchmarks established

Replaces Semantic Scholar Pro

$30/mo → $1/mo

$348/yr saved

REPLACES

Cancel your Semantic Scholar Pro subscription

CANCEL THIS

Semantic Scholar Pro

$30/mo

× Subscription fees
× Data locked in their dashboard
× Per-seat pricing
× Export limits

BUILD THIS

SoloStack + Claude Code

$1/mo

✓ Pay-per-use, no subscription
✓ Your data in your repo
✓ Zero vendor lock-in
✓ Unlimited exports

Save $348/year

WHAT YOU GET

What this skill file teaches Claude

Drop one markdown file into your repo. Claude Code learns how to run this entire workflow.

Data Extraction

Pull key data points from arXiv including profiles, content, and metadata.

Search & Filter

Search by keywords, categories, or specific URLs to target exactly what you need.

Engagement Metrics

Capture engagement signals — views, likes, shares, and comments for every item.

Bulk Processing

Process hundreds or thousands of records in a single run with automatic pagination.

Export & Integration

Output clean JSON ready for CRM import, analysis, or integration with other tools.

Apify Actor: epctex/arxiv-scraper · ~$0.10 per 1,000 papers

IN ACTION

Build it with plain English

Tell Claude Code what to do. It handles the rest.

claude — solostack/

you: |

Processing arXiv data...

✓ Data extracted successfully
✓ 234 records collected
✓ Cleaned and deduplicated
✓ Ready for CRM import

Data saved to scrape-arxiv-results.json

you: |

Processing arXiv data...

✓ Data extracted successfully
✓ 567 records collected
✓ Cleaned and deduplicated
✓ Ready for CRM import

Data saved to scrape-arxiv-results.json

you: |

Processing arXiv data...

✓ Data extracted successfully
✓ 89 records collected
✓ Cleaned and deduplicated
✓ Ready for CRM import

Data saved to scrape-arxiv-results.json

USE CASES

What you can build with this

Research trend monitoring

Track new paper volume by topic to identify emerging research areas before they hit mainstream.

Competitive R&D tracking

Monitor papers from competitor company researchers to understand their R&D direction.

AI/ML trend analysis

arXiv is ground zero for AI research. Track model announcements, benchmark results, and new techniques.

Content creation

Summarize trending research for non-technical audiences in blog posts and newsletters.

IMPORTANT

Things to know

arXiv has a public API and bulk data access. Use official channels when possible.

Paper quality varies — arXiv is a preprint server with no peer review.

Citation counts lag publication by months. Use for trend detection, not impact measurement.

COMPLETE SKILL FILE

Get the full skill file

Everything above is 80% of the skill file. Download the complete version with full implementation details, agent prompts, and ready-to-run scripts.

FAQ

Common questions

Is scraping arXiv legal? ▼

Scraping publicly available data from arXiv is a gray area. Most courts have upheld that public data can be accessed for research purposes. Always respect the platform's ToS, use data for internal research only, and comply with GDPR/CCPA when handling personal information.

How often should I re-scrape? ▼

For trend monitoring, weekly scrapes capture meaningful changes. For competitive analysis, bi-weekly to monthly is sufficient. The optimal frequency depends on how quickly data changes on the platform.

What if the scraper gets blocked? ▼

The Apify actor uses residential proxies and request throttling to minimize blocks. If you experience issues, reduce request volume, increase delays between requests, and consider running scrapes during off-peak hours.

Can I integrate this with my CRM? ▼

Yes. The output is clean JSON that can be directly imported into Neon (Postgres), Airtable, or any CRM with an API. Use the TypeScript integration code in the skill file to automate the pipeline.

How much does it cost to run? ▼

Apify charges ~$0.10 per 1,000 papers. A typical research run costs $1-5 depending on volume. Compare that to SaaS alternatives at $30/mo — you save $348/yr saved.

RELATED SKILLS

Keep building your stack

Ready to automate?

SoloStack gives you every skill pre-installed — scraping, marketing, sales, CRM, and more. One repo. Every department.

Book a Call →

Scrape arXiv with AI

How scraped arXiv data flows across your company

Cancel your Semantic Scholar Pro subscription

Semantic Scholar Pro

SoloStack + Claude Code

What this skill file teaches Claude

Data Extraction

Search & Filter

Engagement Metrics

Bulk Processing

Export & Integration

Build it with plain English

What you can build with this

Research trend monitoring

Competitive R&D tracking

AI/ML trend analysis

Content creation

Things to know

Get the full skill file

Common questions

Keep building your stack

Related Solutions

Free CRM

Free Email Marketing

Free Scheduling

Free Website Builder

Ready to automate?

Scrape arXiv with AI

How scraped arXiv data flows across your company

Cancel your Semantic Scholar Pro subscription

Semantic Scholar Pro

SoloStack + Claude Code

What this skill file teaches Claude

Data Extraction

Search & Filter

Engagement Metrics

Bulk Processing

Export & Integration

Build it with plain English

What you can build with this

Research trend monitoring

Competitive R&D tracking

AI/ML trend analysis

Content creation

Things to know

Get the full skill file

Get the Scrape arXiv Skill File

Scrape arXiv Skill File

Common questions

Keep building your stack

Scrape Academia

Scrape Wikipedia

Scrape Google Search

Related Solutions

Free CRM

Free Email Marketing

Free Scheduling

Free Website Builder

Ready to automate?