Computational Social Simulations for Aiding Policy Design (CSSH, NUS) - Maxthon

1. Feasibility

Technical Feasibility

The project’s core proposition — using LLMs to simulate how segments of society respond to policy proposals — sits at the frontier of what is currently possible, and feasibility is genuinely uneven across its components.

LLMs can plausibly model aggregate discursive tendencies: how certain demographic or ideological groups tend to frame issues, what objections they typically raise, what values they invoke. This is because LLMs are trained on text produced by humans across social contexts, and patterns of reasoning and sentiment are to some degree recoverable. Studies using GPT-class models to simulate survey responses (e.g., Argyle et al., 2023, “Out of One, Many” in Political Analysis) have shown that with appropriate conditioning, LLMs can approximate population-level opinion distributions on well-documented topics reasonably well.

However, Singapore-specific policy contexts present harder challenges. The training corpora of most frontier LLMs are heavily weighted toward Western, English-language discourse. Local cultural logics — the pragmatic deference to state authority that coexists with specific grievance patterns, the distinct ways different ethnic communities frame heritage or health — are underrepresented. Fine-tuning on locally grounded data would be necessary, and the article gives no indication of whether this is planned.

The offline data integration for underrepresented groups (elderly, less digitally active populations) is the right instinct but technically non-trivial. Bridging structured survey data with the probabilistic generative architecture of LLMs requires careful methodological choices — it is not simply a matter of feeding in survey responses as context.

Institutional Feasibility

The collaboration across NUS, NTU, SMU and SUTD, with 14 principal investigators, is large by Singapore standards. Multi-institutional projects at this scale routinely face coordination costs, misaligned incentive structures, and governance friction. MOE/SSRC funding provides a stable backbone, but the five-year timeline will test whether research teams can maintain coherent integration rather than fragmenting into parallel workstreams that nominally share a platform.

The partnerships with the National Heritage Board, NUHS, and HDB Health District @ Queenstown are valuable because they provide real policy questions and potentially real validation data. But government-linked partners in Singapore also bring institutional conservatism — there may be pressure to produce findings that are policy-confirmatory rather than genuinely stress-testing.

2. Value

Academic Value

If the simulation platform achieves even modest validity, it would represent a significant methodological contribution to computational social science. The field currently lacks robust frameworks for prospective social simulation at the policy level — most LLM-based opinion research is retrospective, applied to existing survey data. A validated prospective tool would be publishable in top venues (PNAS, Nature Human Behaviour, Political Analysis) and would attract international attention.

The interdisciplinary architecture also has intrinsic academic value: forcing economists, computational linguists, sociologists and public health researchers to share a common methodological vocabulary tends to generate productive theoretical friction.

Policy and Social Value

The stated goal of complementing expensive large-scale surveys and field studies is genuinely compelling from a public resource perspective. If the platform can identify, in advance, which population segments are most likely to resist a given policy framing and why, policymakers could redesign communication strategies or implementation details before committing to costly rollouts. For a small, tightly governed state like Singapore, this kind of iterative policy refinement tool has real appeal.

There is also a democratisation argument: historically, the feedback loop between policy design and public opinion has been mediated by expensive consultants, focus groups, and surveys accessible mainly to well-resourced agencies. A validated simulation platform could, in principle, be extended to smaller statutory boards or even civil society organisations.

However, the value calculus has a darker side, discussed under scenarios below.

3. Projected Learning Outcomes

At the researcher level, the project should produce:

Methodological literacy in integrating LLMs with social science frameworks — specifically, understanding the conditions under which LLM-simulated agents can and cannot stand in for human respondents. This is a genuinely novel competency that has strong labour market and academic value.

Cross-disciplinary fluency among participants. Humanities researchers will gain exposure to computational pipelines; data scientists will be forced to engage with the theoretical literature on public opinion formation, collective behaviour, and policy legitimacy. The depth of this exchange will depend on project governance.

Domain knowledge in the specific policy areas studied — heritage conservation, preventive healthcare, public housing — which will ground what might otherwise be purely technical research in substantively important questions.

At the institutional level, NUS should gain a replicable model for large-scale interdisciplinary computational research, with lessons applicable to future SSRC tranches.

4. Theoretical Grounding and Knowledge Gains

Relevant Theoretical Frameworks

The project implicitly draws on several bodies of theory, though the article does not articulate these explicitly.

Opinion formation and deliberation theory. Habermas’s communicative rationality framework and its critics (Mouffe, Chantal — agonistic pluralism) are relevant to the question of what “simulating public response” actually means. Is the project modelling expressed preferences, underlying values, or strategic communications? These are not the same thing, and the conflation is a known weakness in computational opinion research.

Agenda-setting and framing theory (McCombs and Shaw; Entman) — the simulation will necessarily embed assumptions about how policy proposals are framed to simulated agents. The framing choices will shape outputs significantly, raising questions about whose framing assumptions are encoded.

Social identity theory (Tajfel and Turner) — group-based responses to policy are not simply aggregates of individual preferences; they are mediated by identity salience, in-group/out-group dynamics, and perceived distributive fairness. LLMs are weak at modelling these dynamics authentically.

Computational social science epistemology — the work of Lazer et al. (2009, Science) and subsequent debates about “big data” hubris are directly relevant. The risk of what Lazer and colleagues later called “algorithm aversion” in reverse — over-relying on model outputs without adequate human validation — is real.

Knowledge Gains

Substantively, the project should generate new knowledge about how Singaporean publics as represented in text corpora reason about heritage, health, and housing. Whether this translates into knowledge about how actual Singaporeans reason is the validational question the project must answer. If the validation protocols are robust, the project could contribute foundational knowledge about the correspondence conditions between LLM simulations and real-world opinion — a question of broad scientific significance beyond Singapore.

5. Scenarios

Scenario A: Validated Success with Moderate Scope

The most probable positive scenario. The platform achieves acceptable validity for well-documented, linguistically accessible policy domains (e.g., heritage conservation framing among English-educated Singaporeans) but performs poorly for underrepresented groups and novel policy contexts. The team publishes methodologically important papers on the scope and limits of LLM-based simulation, refines the tool for the domains where it works, and produces a usable but explicitly bounded policy tool. This is genuinely valuable and scientifically credible.

Scenario B: Overreach and Policy Capture

A more concerning scenario. If the platform produces plausible-looking outputs without rigorous validation — and the pressure to demonstrate utility to government partners is real — there is a risk that policymakers treat simulation results as predictive rather than exploratory. Policy decisions become anchored to model outputs that encode the biases of training data and researcher framing assumptions. Public consultation processes are shortened or bypassed on the grounds that simulation has already “captured” public sentiment. This would represent a form of epistemic closure, substituting computational proxies for genuine public deliberation.

Scenario C: Productive Failure Generating Methodological Insight

The team discovers that LLMs systematically fail to simulate responses from specific demographic segments — the elderly, Malay-Muslim communities on heritage questions, lower-income groups on healthcare cost issues. This failure is itself a significant finding. It reveals the contours of whose discourse is and is not represented in the training data of frontier LLMs, and motivates either the collection of new locally grounded training corpora or a more fundamental theoretical reconceptualisation of what simulation can achieve. This scenario produces important negative knowledge that advances the field.

Scenario D: Platform Repurposing for Influence Operations

A scenario worth raising honestly, even if not the intent of the researchers. A validated tool for simulating how population segments respond to policy framings is structurally equivalent to a tool for identifying optimal persuasion strategies. If such a platform were adopted by communications professionals — governmental or commercial — it could be used not to stress-test policy for genuine public benefit, but to engineer consent by identifying which framings minimise resistance. The dual-use risk is not hypothetical; it is structurally inherent to the technology. The project would benefit from explicit ethical governance frameworks addressing this, analogous to the dual-use review processes in biosecurity research.

Scenario E: Productive Internationalisation

If the platform achieves credible results, it becomes a model for comparable initiatives in Southeast Asian contexts — Malaysia, Indonesia, Thailand — where the correspondence between official policy discourse and public sentiment is similarly opaque to researchers and policymakers alike. CSSH could become a regional hub, attracting comparative data and international partnerships that further validate and extend the methodology. This is the optimistic long-run scenario and is plausible if Scenario A is achieved first.

Summary Assessment

The project is feasible at a modest level of ambition, genuinely valuable if validated rigorously, and theoretically significant if it engages honestly with the epistemological literature on computational social simulation. Its greatest risks are not technical but institutional: the pressure to demonstrate policy utility may outpace the validation work necessary to make that utility credible. The dual-use question deserves explicit ethical attention. And the project’s ultimate contribution to knowledge will depend heavily on whether it is willing to publish its failures as carefully as its successes.