The Chinese Immunotherapy That Passed the Dutch Test, and How To Assess What is Expected to Be a Tsunami of Immunotherapies.
The Dutch are rather picky in selecting which oncology drugs to prescribe or reimburse. They use a set of criteria called PASKWIL to define what is clinically meaningful. Recently, the Dutch have been quite excited about serplulimab, an immunotherapy developed in China.
It is the first immunotherapy for extensive-stage small-cell lung cancer (ES-SCLC) to meet the PASKWIL criteria.
This drug is offered on its website in China for US$290 per vial, yet in Europe it costs roughly 5.5 times more. You might think that this is due to the effort of making the drug relevant for the European market. But no… there were in total only 7 European patients recruited in the trial. The effects of the drug are also rather modest: in combination with chemotherapy, it delays progression or death by about 1.4 months compared with chemotherapy alone. And the median overall survival benefit is 4.5 months (HR 0.63, 95% CI: 0.49–0.82).
Today’s question is simple: although the PASKWIL criteria provide an invaluable framework for evaluating novel indications and therapies, are HTA agencies equipped to assess them when multiple compounds from the same class are tested?
Most profitable class of drugs
It is easy to lose track of how many immunotherapies like serplulimab are currently available. Nivolumab and pembrolizumab were the first to enter the market in 2014 and have since developed the most indications. Currently, there are around 14–15 or even more me-too versions of these drugs on the market, but this is only the beginning of what is expected to be a tsunami of such agents.
In total, more than 180 companies are developing over 200 of these immunotherapies.
The reason for such investments is profitability. Companies have found ways not to compete directly in the market. Studies are often conducted slightly differently, in different indications and settings, or in combination with different agents. We are often led to believe that one drug is slightly better than another for a specific indication based on “expert opinion” or the overinterpretation of subgroups or post-hoc analyses; but in fact there is no robust evidence to justify such conclusions.
There are few within-class head-to-head trials, and even fewer showing a survival gain of one compound over the other (see here).
The landscape of immunotherapies for extensive-stage small-cell lung cancer
Serplulimab is not the only immunotherapy authorised in Europe for ES-SCLC. Two other immunotherapies, atezolizumab and durvalumab, are also authorised. However, their effect is even smaller, with median overall survival benefits of 2 months (HR 0.76, 95% CI: 0.60–0.95) and 2.4 months (HR 0.75, 95% CI: 0.63–0.91), respectively. (See Assessment of the EMA comparing the results here).
But there is also a history of negative trials. For instance, KEYNOTE-604, a Phase 3 trial of pembrolizumab plus chemotherapy, failed to meet the overall survival endpoint required for full FDA approval, and its accelerated approval was withdrawn for metastatic SCLC in March 2021. Also, nivolumab plus chemotherapy (EA5161) failed to reach statistical significance versus chemotherapy alone (Figure 1).
To our surprise, no HTA agencies appear to consider these negative trials in their assessments, and thus they may miss the overall picture. Surely, we do not want reimbursement decisions to be based on coincidence or bias. If one looks at the overall effect of immunotherapies in ES-SCLC, they will not pass the Dutch test.

Multiplicity and me-too drug development
This introduces the concept of multiplicity which has emerged with the surge of me-too drug development. It is not excluded that the same class of agents, tested again and again, will ultimately yield some positive results by chance alone. In such cases, statistical estimates can be reassessed taking into account multiplicity, using methods like Bonferroni or Stein estimation.
In a recent work from Carlisle and colleagues focusing on immune checkpoint-inhibitors trials, the authors used Stein estimation to account for the crowded trials with me-too compounds. They found that, after adjusting for the totality of trials, estimates of results, in terms of hazard radio, decreased in 35% of trials, and amoung trials with an original p-value that was inferior to 0.05, 27% of those lost significance after adjustment (their work here).
In other words, while considering the redundancy of trials in similar class of drugs, it is not excluded that some positive results will be the result of a statistical artifact, which should be accounted for with appropriate methods.
Reimbursement decisions should consider the totality of evidence
In the following scenarios, we use analogies to illustrate how reimbursement decisions are made in practice and how they are intended to be made.
Scenario 1: Imagine you are standing in a drugstore, looking at different brands of a supplement on the shelf. You read the packaging, compare prices, and then, based on your needs and budget, decide whether to choose one. This is similar to how reimbursement decisions are often made.
Scenario 2: Imagine that, before going to the drugstore, you search PubMed for studies on different variants of that supplement. You realise that the evidence is not straightforward: some studies are positive, while others are negative. In such cases, meta-analyses, systematic reviews, and even statistical approaches accounting for the numerous trials, can help determine whether the overall effect is clinically meaningful for you.
This is essentially what health technology assessors are expected to do. Their role is to consider all available evidence when evaluating a medicine, including negative studies involving the same class of drug for that indication. Without strong HTA, healthcare systems risk funding expensive drugs with little benefit.





