Open source · AI PM hiring · 2026

The 2026 AI PM Evaluation Framework

A rigorous, open-source way to evaluate Applied AI Product Manager candidates in an era of weekly model releases. Built for teams who need builders, not coordinators.

Pillars
6
Scoring rubric
60 pts
Decision tiers
4
License
Open
Why I built this

From a private checklist to an open framework

This conversation between Aakash Gupta and Jaclyn Konzelmann (Google's Director of AI Product, on his podcast) genuinely inspired me. Jaclyn's evaluation framework was so clear and rigorous that my first thought was simple: I need to measure myself against this.

In disaster relief, you don't map safe routes and hide them. You share them.

I started outlining her criteria as a self-assessment, then realized it shouldn't be a private checklist. So it became a framework with a purpose: give everyone, especially people from non-traditional backgrounds, an actionable guide to what to build, where to focus, and how to position themselves for AI PM roles.

If you have edits or ideas to improve it, please open an issue or pull request on GitHub.

Continue to the philosophy

The conversation that inspired this framework

Philosophy

Why traditional PM evaluation fails AI builders

Four principles that separate people who ship AI products from people who coordinate them.

Builders over coordinators

The era of pure project management is over. AI PMs in 2026 ship code, prototype in hours, and show technical depth through real projects.

Key shift: from "managed a team that built X" to "I built X in a weekend."

Velocity as a core competency

Speed is not optional, it is existential. AI capabilities evolve weekly, so PMs must prototype, test, and iterate faster than the technology changes.

Evidence required: a portfolio of 8 to 10 concurrent side projects, built in days, not months.

Deep AI intuition

Surface awareness is not enough. Real intuition comes from hands-on building: understanding model limitations, prompt engineering, and architectural tradeoffs.

Non-negotiable: personal AI projects that apply LLMs, agents, or workflows in creative ways.

Building in public

The best AI PMs share their learning journey openly, through blogs, repositories, demos, and thought leadership that helps other people build.

Signal: an active GitHub, a technical blog, or regular AI experimentation shared in the open.

The 2026 paradigm shift: we are not hiring people to manage AI product development. We are hiring people who can build AI products themselves, then scale that capability through teams.

The assessment

The six pillars

A complete evaluation across the full spectrum of AI PM competencies.

01

Technical skills & hands-on building

Can they actually build things, or only talk about building?

  • Recent hands-on coding (GitHub activity, personal projects)
  • Personal AI tools, agents, or workflows built and shipped
  • Ability to prototype ideas in hours, not weeks
  • Comfort with modern development tools and workflows
  • Technical curiosity shown through experimentation
Red flag No repositories, no personal projects, last code written five or more years ago.
02

Product thinking & 0-to-1 leadership

Have they taken something from idea to shipped product?

  • Clear examples of 0-to-1 product launches
  • User-centric problem definition and validation
  • Comfort navigating ambiguity and incomplete information
  • Evidence of product taste and design sensibility
  • Metrics-driven decision making
Strong signal Multiple products launched from scratch with measurable user impact.
03

AI/ML knowledge & deep intuition

Do they understand AI from building, not just from reading?

  • Personal AI projects that demonstrate model understanding
  • Knowledge of current capabilities and limitations
  • Experience with prompt engineering, fine-tuning, or agents
  • Creative applications of AI to real problems
  • Stays current through active experimentation, not passive reading
Critical test Can they explain why they chose GPT-4 over Claude or Gemini for a specific use case?
04

Communication & building in public

Do they share their journey and help others build?

  • Active technical blog, Substack, or public documentation
  • Repositories with clear READMEs and demos
  • Thought leadership that advances the field
  • Compelling storytelling and narrative structure
  • Ability to explain complex technical concepts simply
Differentiator 5,000+ engaged followers sharing AI building insights regularly.
05

Strategic thinking & second-order vision

Do they build platforms that enable others, or only first-order features?

  • Platform thinking: tools that get better as AI improves
  • Understanding of ecosystem dynamics and network effects
  • Long-term vision balanced with rapid iteration
  • Ability to identify leverage points and force multipliers
  • Systems thinking applied to product architecture
Question Are they building for today's AI, or tomorrow's?
06

Execution & rapid shipping

Do they treat ideas as cheap and execution as everything?

  • A portfolio of projects shipped in days, not months
  • 8 to 10 concurrent side projects that show breadth
  • Bias toward action over analysis paralysis
  • Comfortable with imperfect v1s and rapid iteration
  • The language of building: "I shipped," not "I managed"
Litmus test Can they build and ship a working demo in a weekend?
The decision

The decision framework

How candidates move through the 2026 standard, from minimum thresholds to a final score.

Minimum thresholds (pass / fail)

Personal AI projects: at least one visible AI project with code or a demo.

Building in public: evidence of sharing work (GitHub, blog, demos).

Resume creativity: product taste beyond a standard LinkedIn template.

Fail any threshold → No Screen

Red flags (disqualifiers)

Job hopping without a clear narrative, inflated titles, vague responsibilities, no concrete metrics, plagiarism, or misrepresentation.

Any red flag → No Screen

Must-have signals (5 of 5 required)

  1. Evidence of continuous learning and staying current with AI
  2. At least one personal AI project with evidence
  3. Experience shipping products, not just planning them
  4. A compelling narrative explaining their journey
  5. Clear alignment with the AI/ML product space
Missing any → Maybe, at best

Differentiation signals (3 or more for a strong screen)

  • 8 to 10 concurrent side projects
  • Built something in hours or days, not months
  • An active technical blog or significant following
  • Open-source contributions or community leadership
  • Platform or framework thinking in past work
  • Conference speaking or thought leadership
  • A unique background or unconventional path
  • Evidence of rapid prototyping velocity
3 or more signals → Strong Screen

Scoring system

Pillar Weight Max score Evaluation focus
Technical skills10 points10 / 10GitHub activity, personal projects, code quality
Product thinking10 points10 / 100-to-1 launches, user impact, metrics
AI/ML knowledge10 points10 / 10Personal AI projects, hands-on evidence
Communication10 points10 / 10Public building, blog, thought leadership
Strategic thinking10 points10 / 10Platform thinking, second-order effects
Execution10 points10 / 10Shipping velocity, portfolio breadth
Total60 points60 / 60Aggregate across all pillars
<25No Screen
25–34Maybe
35–44Screen
45+Strong Screen

Aggregate score across all six pillars, out of a maximum of 60 points.

In practice

How to use this framework

Integrate the evaluation system into your existing hiring process.

Resume screening

Use the automated analyzer for AI-powered analysis from multiple providers (GPT-5, Claude Sonnet 4.5, Gemini 2.5 Pro):

bin/analyze --deep-analysis resume.pdf

Generates HTML reports with consensus scoring and detailed pillar breakdowns.

Interview panels

Share the framework with all interviewers beforehand. Use pillar-specific questions to probe each area:

  • "Walk me through your GitHub repositories."
  • "What did you build last weekend?"
  • "Show me your AI experiments."

Calibration

Run multiple candidates through the framework and compare scores. Calibrate your team's shared sense of what a "Strong Screen" looks like in practice.

Customization

Fork the repository and adapt the framework: adjust weights, add custom criteria, or modify the scoring rubric.

Feedback and improvements are welcome. Open an issue or pull request to help make it better for everyone.

Ready to raise your hiring bar?

Start evaluating AI PM candidates against the 2026 standard. Open source, free to use, and continuously evolving.