News

  • Two papers accepted to the Technical AI Governance Workshop at ICML 2026

    Co-authored two papers: ‘Fingerprinting All AI Cluster I/O Without Mutually Trusted Processors’ (which started as an Apart hackathon weekend project) and ‘Exploring Systems-Thinking Approaches to Loss of Control Risk’ (which was the output of one of the SPAR projects I was supervising). Thanks to all collaborators!

  • Published a blogpost On theoretical physics, AI and human creativity

    TLDR: OpenAI released a paper claiming GPT-5.2 derived a result in theoretical physics. By sheer coincidence, the subject of this paper is literally what I did my PhD on. I was impressed with GPT-5.2 and frustrated with the extremely poor level of public discourse around this paper, so I wrote a thorough blogpost about it. I was later told that people enjoyed it and found it useful for calibrating their views on AI capabilities in frontier maths/physics research.

  • Attended an informal one-day ‘unconference’ on AI Verification in Oxford, UK

    Discussed open problems in verification, software and hardware aspects, most pressing directions, strategies for popularising AI Verification research and even red-teaming the whole agenda.

  • Gave a talk for an online event organised by BlueDot Impact

    A 20-minute online talk covering the risk modelling agenda at SaferAI, as well as the bottlenecks and open questions.

  • Gave a talk on behalf of AI Safety Poland as part of a WAIT meetup in Wrocław, Poland

    Gave a talk about AI Safety, compute governance, AI regulation, the semiconductor supply chain and geopolitics. About 160 people in attendance. WAIT (Wrocław AI Team) is a community of AI/Data Science researchers, practitioners and enthusiasts.

  • Attended the AIxBio Symposium in Cambridge, UK

    One-day conference organised by the Cambridge Biosecurity Hub, in collaboration with the ERA Fellowship. Very interesting (and concerning) talks about the latest capabilities of LLMs in bio and wetlab contexts.

  • Taught one day of the Orion AI Governance Initiative

    Orion is a talent development scheme for students interested in working on AI policy/governance, with a focus on AI Safety. Gave a talk about compute governance.

  • AI Safety Poland was awarded USD 4800 by BlueDot Impact

    We received funding through the Rapid Grants programme to organise a series of 4 in-person events in Warsaw, Poznań, Kraków and Wrocław. Thanks to BlueDot for enabling this and thanks to Jakub Nowak for taking care of the application!

  • Attended the Alignment Workshop in London, UK

    Two-day conference organised by FAR AI.

  • Started supervising 3 SPAR projects at SaferAI

    Projects are on: (1) improving risk modelling methodology, (2) LLM forecasters and (3) applying ideas from systems thinking/safety engineering to AI-driven Loss of Control scenarios.

  • Published my project on hash collisions from the AI Security Bootcamp

    This project won 2nd prize at the bootcamp. TLDR: if you give me any two .pickle files, I can make their MD5 hashes collide. I also explain why this doesn’t work for .safetensors. One of the nerdiest and most enjoyable projects I’ve ever done, and definitely the most low-level. Not sure about its practical usefulness, though, as nobody uses MD5 anymore (and nobody should be using .pickle files!).

  • Gave an Introduction to AI Safety online talk for AI Safety Poland

    This talk kicked off our biweekly webinar series of which I am the organiser and host. I encourage you to subscribe to our Luma calendar for news of webinars and more.

  • Started as a Research Scientist at SaferAI

    Focus on risk modelling of AI-enabled cyber misuse and Loss of Control scenarios. Based at LISA, London.

  • Attended the first edition of the AI Security Bootcamp in London, UK

    A month-long bootcamp on various topics in IT and AI security, e.g.: networking, pentesting, cryptography, Docker security, reverse engineering, cloud security, jailbreaks, prompt injections, membership inference attacks, weight extraction attacks. All of it was super interesting, but also quite hard to follow for me, as it was at a high pace and for people who have more security-relevant background than me. In any case, I enjoyed it very much. Thanks to the organisers! Teaching materials can be found here. Addendum: my project on hash collisions won 2nd prize at the bootcamp.

  • Presented a poster at the Technical AI Governance Forum in London, UK

    Poster based on my MATS project. A two-day conference on various issues in technical governance, including agentic oversight, compute governance, evaluations, geopolitics, cybersecurity, verification and forecasting.

  • A piece in which I look into the EU Whistleblowing Directive, analyse where it falls short in the context of AI, stress the importance of whistleblowing for upholding AI regulation, and sketch a path forward. Written and published with the generous support of Karl Koch and the AI Whistleblower Initiative.

  • Attended the AI, Animals and Digital Minds conference in London, UK.

    Previously called ‘AI for Animals’. Particular highlights were Adrià Moret’s talk about his paper AI Welfare Risks, as well as listening to Jeff Sebo speak for 30 minutes from memory and without stuttering once.

  • Attended EAG London 2025

    My second EAG. Like last time, a hugely positive experience.

  • Taught one day of the Orion AI Governance Initiative

    Orion is a talent development scheme for students interested in working on AI policy/governance, with a focus on AI Safety. Gave a talk about compute governance.

  • Attended the ControlConf in London, UK

    A two-day conference on AI Control hosted by Redwood Research, FAR AI, and the UK AISI.

  • Attended the AI for Animals conference in Berkeley, California.

    A two-day conference covering topics such as animal welfare, digital minds, consciousness, precision livestock farming, ethics and advocacy. Very interesting, would recommend.

  • Attended EA Global: Bay Area 2025 in Oakland, California

    My first EAG. A hugely positive experience. Talks, workshops and 1-1 meetings on a multitude of topics such as AI Safety, effective charitable giving or animal welfare.

  • Started AI Safety Poland with 3 co-organisers

    We started a Polish community for researchers, practitioners and enthusiasts of AI Safety (and security). To be absolutely transparent, the community (and an older website) did exist before, but it was largely dormant and fragmented. Beginning in February 2025, we initiated a concentrated and ongoing effort to grow the AI Safety capacity and awareness in our country. We run webinars, a reading club, local meetups and a dedicated Slack. I also run 1-1 career consultations for people interested in working on AI Safety. AISPL is available for comments/interviews with journalists.

  • Started MATS in Berkeley, California

    Worked on a compute governance project under the supervision of Janet Egan from the Centre for New American Security. Did the extension phase in London, UK, until September.

  • Completed BlueDot’s Biosecurity Fundamentals

    Very interesting course. For my project, I built a simple web app that lets users explore correlations between various epidemiological signals, as well as perform basic epidemiological forecasts.

  • Completed the introductory course on s-risks from the Centre for Reducing Suffering

    A 6-week course introducing the idea of s-risks in contexts like AI, stable totalitarianism or digital minds. Through Tobias Baumann’s book Avoiding the Worst, I also started reading about things like multi-agent cooperation, spitefulness, the Dark Triad and better political systems.

  • Attended ML4Good, a 10-day bootcamp on AI Safety in Germany

    A very good overview of AI Safety, strategy and basic ML aspects, held in a small village in the German countryside. Met some cool people and still keep in touch/collaborate with some of them.

  • Completed BlueDot’s AI Safety Fundamentals

    Completed the 8-week (part-time) Technical Alignment course + a 4-week project in which I deliberately fine-tuned an LLM to be sycophantic and looked at how this trades off with truthfulness.

  • Started Pivotal Fellowship in London, UK

    Worked on understanding jailbreaks through the vision modality, and their lack of transferability between VLMs.

  • Started Faculty Fellowship in London, UK

    Worked on input-space jailbreaks on vision-language models.