Research Paper Library
openA separate surface for papers, with notes, reproducibility links, source-credibility checks, an AI summary feature, and runnable jupyter notebooks where I have reproduced the work.
proposed 2026-05-17researchlibraryreproducibilityai-summaryjupyter
The current Library surface treats every PDF the same way: a book entry with title, author, status, cover. That works for books. It does not work for research papers, because the things a reader (or I) wants to do with a paper are different from the things they want to do with a book.
This is a proposal for a separate Research Papers surface, with its own philosophy, its own schema, and its own affordances.
Why papers are not books
You read a book. You finish or you do not. The note-taking is optional. The reading is mostly linear. The library cares mostly about what you have read and what you thought of it.
A paper is closer to a small machine. You read it, but the meaningful interaction is what happens around it. You take notes. You evaluate the source. You reflect critically. You sometimes reproduce the work, in math or in code. You sometimes find the paper is bad, by an author who has been called out elsewhere, published in a journal that nobody respects. You sometimes find it is foundational and you build the rest of your understanding on top of it.
A paper, in other words, has a lot more associated artifacts than a book.
The surface
A new top-level route at /research (subdomain optional, see below). Each paper gets its own detail page. The fields per paper:
- Title, authors, year, journal or venue
- DOI / arXiv ID / publisher URL
- Tags by topic
- Status: reading, finished, abandoned, reproducing
- My notes (markdown, as detailed as I want)
- Critical evaluation: what is wrong, what is unclear, what I would improve
- Reproducibility section: linked documents, code repository URL, notebook URL, datasets
- Source-credibility summary: how I checked the author and publisher, what came back
- AI summary, generated on demand, clearly labelled as AI generated
The structure makes the surface useful as a research log, not just a reading list.
AI summary feature
Every paper entry has a button that produces an AI summary of the paper itself. The summary is:
- Clearly labelled as machine-generated, with the model and date
- Linked back to the source paper so a reader can verify
- Editable by me when the AI gets something wrong
- Cached so we are not regenerating on every page load
The implementation is straightforward: pass the paper text (or DOI lookup if open access) to a model, store the response, render it under a clearly visible header that distinguishes it from my own notes. The discipline is that the summary never replaces reading. It is a fast index for visitors and for my own future self.
Source-credibility meta-search
Adjacent feature, possibly its own ideas entry later: given a paper, automatically check the author(s) and publisher against known signals.
- Predatory publisher lists (Beall's list, MDPI flags, etc.)
- Author affiliation, h-index, citation count
- Retractions database for the author
- Journal impact factor and indexing status
- Conference reputation if applicable
Output a credibility card alongside the paper that says: this author has had two retractions, this journal is on the predatory list, proceed with caution. Or: this is a top-tier publication, well-cited, no flags. The point is to do the check once and store it, not have every visitor wonder whether the source is real.
Reproducibility, when it applies
For papers where I have reproduced the work in code:
- The jupyter notebook lives at a URL on the site (or subdomain, see below)
- A visitor can open the notebook in the browser, read the code alongside the paper, and download the
.ipynbfile - If we can serve the notebook live with a kernel attached (JupyterLite, marimo, or a hosted JupyterHub), even better. Failing that, the download path is the floor.
- The notebook is linked from the paper entry, and vice versa
For papers where I have done the math by hand instead of code, the equivalent is a derivation notebook (scanned or LaTeX) linked from the paper.
SEO and subdomain question
Two options for the URL shape:
-
husayngokal.com/research— same domain, same auth context, simpler. The papers benefit from the rest of the site's authority. Easier to cross-link from the Notebook and Mental Models surfaces. -
research.husayngokal.com— subdomain. Cleaner separation in search engines, can have its own sitemap and submission to Google Scholar. Slightly more setup.
My current bias: do /research first, and consider the subdomain when the corpus is big enough that the SEO separation actually matters. The risk of premature subdomain split is that early on each side looks empty and Google treats both as low-authority.
Schema for the vault
Similar to library books but with the additional fields:
title: ...
authors: [...]
year: ...
venue: ... # journal, conference, arxiv, etc.
doi: ... # or arxiv-id
status: reading | finished | abandoned | reproducing
url: ... # publisher URL
notebook-url: ... # optional, where the reproduction lives
code-repo: ... # optional, GitHub URL
credibility: ... # short summary of the meta-search result
ai-summary-generated-at: ...
ai-summary-model: ...
tags: [...]
Body of the markdown carries my notes, critical evaluation, reproduction log.
What this would replace
Right now my reMarkable import drops every PDF into the Library, including dozens of academic papers. Those should move to /research and out of /library entirely. The Library page would become books only, which is what a Library should be.
Next steps if this becomes Building
- Schema and migration
/researchindex and/research/[slug]detail routes- Vault folder + parser entry
- Reproduction-notebook hosting decision (URL on the same server vs. JupyterLite vs. Binder)
- Source-credibility meta-search MVP (manual at first, automated checks layered on)
- AI summary endpoint
- Migration: move existing academic-paper entries from
/libraryto/research
The full feature set is a separate scope. The smallest useful version is just the surface with hand-entered metadata and notes. Everything else (AI summary, credibility check, hosted notebooks) is a layer on top.