Research Paper Library

open

A separate surface for papers, with notes, reproducibility links, source-credibility checks, an AI summary feature, and runnable jupyter notebooks where I have reproduced the work.

proposed 2026-05-17researchlibraryreproducibilityai-summaryjupyter

The current Library surface treats every PDF the same way: a book entry with title, author, status, cover. That works for books. It does not work for research papers, because the things a reader (or I) wants to do with a paper are different from the things they want to do with a book.

This is a proposal for a separate Research Papers surface, with its own philosophy, its own schema, and its own affordances.

Why papers are not books

You read a book. You finish or you do not. The note-taking is optional. The reading is mostly linear. The library cares mostly about what you have read and what you thought of it.

A paper is closer to a small machine. You read it, but the meaningful interaction is what happens around it. You take notes. You evaluate the source. You reflect critically. You sometimes reproduce the work, in math or in code. You sometimes find the paper is bad, by an author who has been called out elsewhere, published in a journal that nobody respects. You sometimes find it is foundational and you build the rest of your understanding on top of it.

A paper, in other words, has a lot more associated artifacts than a book.

The surface

A new top-level route at /research (subdomain optional, see below). Each paper gets its own detail page. The fields per paper:

Title, authors, year, journal or venue
DOI / arXiv ID / publisher URL
Tags by topic
Status: reading, finished, abandoned, reproducing
My notes (markdown, as detailed as I want)
Critical evaluation: what is wrong, what is unclear, what I would improve
Reproducibility section: linked documents, code repository URL, notebook URL, datasets
Source-credibility summary: how I checked the author and publisher, what came back
AI summary, generated on demand, clearly labelled as AI generated

The structure makes the surface useful as a research log, not just a reading list.

AI summary feature

Every paper entry has a button that produces an AI summary of the paper itself. The summary is:

Clearly labelled as machine-generated, with the model and date
Linked back to the source paper so a reader can verify
Editable by me when the AI gets something wrong
Cached so we are not regenerating on every page load

The implementation is straightforward: pass the paper text (or DOI lookup if open access) to a model, store the response, render it under a clearly visible header that distinguishes it from my own notes. The discipline is that the summary never replaces reading. It is a fast index for visitors and for my own future self.

Source-credibility meta-search

Adjacent feature, possibly its own ideas entry later: given a paper, automatically check the author(s) and publisher against known signals.

Predatory publisher lists (Beall's list, MDPI flags, etc.)
Author affiliation, h-index, citation count
Retractions database for the author
Journal impact factor and indexing status
Conference reputation if applicable

Output a credibility card alongside the paper that says: this author has had two retractions, this journal is on the predatory list, proceed with caution. Or: this is a top-tier publication, well-cited, no flags. The point is to do the check once and store it, not have every visitor wonder whether the source is real.

Reproducibility, when it applies

For papers where I have reproduced the work in code:

The jupyter notebook lives at a URL on the site (or subdomain, see below)
A visitor can open the notebook in the browser, read the code alongside the paper, and download the .ipynb file
If we can serve the notebook live with a kernel attached (JupyterLite, marimo, or a hosted JupyterHub), even better. Failing that, the download path is the floor.
The notebook is linked from the paper entry, and vice versa

For papers where I have done the math by hand instead of code, the equivalent is a derivation notebook (scanned or LaTeX) linked from the paper.

SEO and subdomain question

Two options for the URL shape:

husayngokal.com/research — same domain, same auth context, simpler. The papers benefit from the rest of the site's authority. Easier to cross-link from the Notebook and Mental Models surfaces.
research.husayngokal.com — subdomain. Cleaner separation in search engines, can have its own sitemap and submission to Google Scholar. Slightly more setup.

My current bias: do /research first, and consider the subdomain when the corpus is big enough that the SEO separation actually matters. The risk of premature subdomain split is that early on each side looks empty and Google treats both as low-authority.

Schema for the vault

Similar to library books but with the additional fields:

title: ...
authors: [...]
year: ...
venue: ...                # journal, conference, arxiv, etc.
doi: ...                  # or arxiv-id
status: reading | finished | abandoned | reproducing
url: ...                  # publisher URL
notebook-url: ...         # optional, where the reproduction lives
code-repo: ...            # optional, GitHub URL
credibility: ...          # short summary of the meta-search result
ai-summary-generated-at: ...
ai-summary-model: ...
tags: [...]

Body of the markdown carries my notes, critical evaluation, reproduction log.

What this would replace

Right now my reMarkable import drops every PDF into the Library, including dozens of academic papers. Those should move to /research and out of /library entirely. The Library page would become books only, which is what a Library should be.

Next steps if this becomes Building

Schema and migration
/research index and /research/[slug] detail routes
Vault folder + parser entry
Reproduction-notebook hosting decision (URL on the same server vs. JupyterLite vs. Binder)
Source-credibility meta-search MVP (manual at first, automated checks layered on)
AI summary endpoint
Migration: move existing academic-paper entries from /library to /research

The full feature set is a separate scope. The smallest useful version is just the surface with hand-entered metadata and notes. Everything else (AI summary, credibility check, hosted notebooks) is a layer on top.