Co-authored two papers: ‘Fingerprinting All AI Cluster I/O Without Mutually Trusted Processors’ (which started as an Apart hackathon weekend project) and ‘Exploring Systems-Thinking Approaches to Loss of Control Risk’ (which was the output of one of the SPAR projects I was supervising). Thanks to all collaborators!
TLDR: OpenAI released a paper claiming GPT-5.2 derived a result in theoretical physics. By sheer coincidence, the subject of this paper is literally what I did my PhD on. I was impressed with GPT-5.2 and frustrated with the extremely poor level of public discourse around this paper, so I wrote a thorough blogpost about it. I was later told that people enjoyed it and found it useful for calibrating their views on AI capabilities in frontier maths/physics research.
|
Attended an informal one-day ‘unconference’ on AI Verification in Oxford, UK
Discussed open problems in verification, software and hardware aspects, most pressing directions, strategies for popularising AI Verification research and even red-teaming the whole agenda.
|
Gave a talk for an online event organised by BlueDot Impact
A 20-minute online talk covering the risk modelling agenda at SaferAI, as well as the bottlenecks and open questions.
|
Gave a talk on behalf of AI Safety Poland as part of a WAIT meetup in Wrocław, Poland
Gave a talk about AI Safety, compute governance, AI regulation, the semiconductor supply chain and geopolitics. About 160 people in attendance. WAIT (Wrocław AI Team) is a community of AI/Data Science researchers, practitioners and enthusiasts.
|
Attended the AIxBio Symposium in Cambridge, UK
One-day conference organised by the Cambridge Biosecurity Hub, in collaboration with the ERA Fellowship. Very interesting (and concerning) talks about the latest capabilities of LLMs in bio and wetlab contexts.
Orion is a talent development scheme for students interested in working on AI policy/governance, with a focus on AI Safety. Gave a talk about compute governance.
We received funding through the Rapid Grants programme to organise a series of 4 in-person events in Warsaw, Poznań, Kraków and Wrocław. Thanks to BlueDot for enabling this and thanks to Jakub Nowak for taking care of the application!
Projects are on: (1) improving risk modelling methodology, (2) LLM forecasters and (3) applying ideas from systems thinking/safety engineering to AI-driven Loss of Control scenarios.
This project won 2nd prize at the bootcamp. TLDR: if you give me any two .pickle files, I can make their MD5 hashes collide. I also explain why this doesn’t work for .safetensors. One of the nerdiest and most enjoyable projects I’ve ever done, and definitely the most low-level. Not sure about its practical usefulness, though, as nobody uses MD5 anymore (and nobody should be using .pickle files!).
This talk kicked off our biweekly webinar series of which I am the organiser and host. I encourage you to subscribe to our Luma calendar for news of webinars and more.
A month-long bootcamp on various topics in IT and AI security, e.g.: networking, pentesting, cryptography, Docker security, reverse engineering, cloud security, jailbreaks, prompt injections, membership inference attacks, weight extraction attacks. All of it was super interesting, but also quite hard to follow for me, as it was at a high pace and for people who have more security-relevant background than me. In any case, I enjoyed it very much. Thanks to the organisers! Teaching materials can be found here. Addendum: my project on hash collisions won 2nd prize at the bootcamp.
Poster based on my MATS project. A two-day conference on various issues in technical governance, including agentic oversight, compute governance, evaluations, geopolitics, cybersecurity, verification and forecasting.
A piece in which I look into the EU Whistleblowing Directive, analyse where it falls short in the context of AI, stress the importance of whistleblowing for upholding AI regulation, and sketch a path forward. Written and published with the generous support of Karl Koch and the AI Whistleblower Initiative.
Main output of my MATS project, later accepted as an oral presentation to the Technical AI Governance Workshop at ICML 2025. This is a very policy-oriented document; I also have a much longer and more technical version. Email me if interested.
Previously called ‘AI for Animals’. Particular highlights were Adrià Moret’s talk about his paper AI Welfare Risks, as well as listening to Jeff Sebo speak for 30 minutes from memory and without stuttering once.
Orion is a talent development scheme for students interested in working on AI policy/governance, with a focus on AI Safety. Gave a talk about compute governance.
A two-day conference on AI Control hosted by Redwood Research, FAR AI, and the UK AISI.
|
Attended the AI for Animals conference in Berkeley, California.
A two-day conference covering topics such as animal welfare, digital minds, consciousness, precision livestock farming, ethics and advocacy. Very interesting, would recommend.
My first EAG. A hugely positive experience. Talks, workshops and 1-1 meetings on a multitude of topics such as AI Safety, effective charitable giving or animal welfare.
We started a Polish community for researchers, practitioners and enthusiasts of AI Safety (and security). To be absolutely transparent, the community (and an older website) did exist before, but it was largely dormant and fragmented. Beginning in February 2025, we initiated a concentrated and ongoing effort to grow the AI Safety capacity and awareness in our country. We run webinars, a reading club, local meetups and a dedicated Slack. I also run 1-1 career consultations for people interested in working on AI Safety. AISPL is available for comments/interviews with journalists.
|
Started MATS in Berkeley, California
Worked on a compute governance project under the supervision of Janet Egan from the Centre for New American Security. Did the extension phase in London, UK, until September.
Very interesting course. For my project, I built a simple web app that lets users explore correlations between various epidemiological signals, as well as perform basic epidemiological forecasts.
|
Completed the introductory course on s-risks from the Centre for Reducing Suffering
A 6-week course introducing the idea of s-risks in contexts like AI, stable totalitarianism or digital minds. Through Tobias Baumann’s book Avoiding the Worst, I also started reading about things like multi-agent cooperation, spitefulness, the Dark Triad and better political systems.
|
Attended ML4Good, a 10-day bootcamp on AI Safety in Germany
A very good overview of AI Safety, strategy and basic ML aspects, held in a small village in the German countryside. Met some cool people and still keep in touch/collaborate with some of them.
Completed the 8-week (part-time) Technical Alignment course + a 4-week project in which I deliberately fine-tuned an LLM to be sycophantic and looked at how this trades off with truthfulness.