An Israeli soldier photographs a Palestinian man using an AI facial recognition tool during a raid in Wadi Qutyna, Al-Mughayer, occupied West Bank, January 17, 2025. ((Avishay Mohar/Activestills))
The Israeli army is developing an AI language model using millions of intercepted conversations between Palestinians, which could accelerate the process of incrimination and arrest, a joint investigation reveals.
The Israeli army is developing a new, ChatGPT-like artificial intelligence tool and training it on millions of Arabic conversations obtained through the surveillance of Palestinians in the occupied territories, an investigation by +972 Magazine, Local Call, and the Guardian can reveal.
The AI tool — which is being built under the auspices of Unit 8200, an elite cyber warfare squad within Israel’s Military Intelligence Directorate — is what’s known as a Large Language Model (LLM): a machine-learning program capable of analyzing information and generating, translating, predicting, and summarizing text. Whereas LLMs available to the public, like the engine behind ChatGPT, are trained on information scraped from the internet, the new model under development by the Israeli army is being fed vast amounts of intelligence collected on the everyday lives of Palestinians living under occupation.
The existence of Unit 8200’s LLM was confirmed to +972, Local Call, and the Guardian by three Israeli security sources with knowledge of its development. The model was still being trained in the second half of last year, and it is unclear whether it has been deployed yet or how exactly the army will use it. However, sources explained that a key benefit for the army will be the tool’s ability to rapidly process large quantities of surveillance material in order to “answer questions” about specific individuals. Judging by how the army already uses smaller language models, it seems likely that the LLM could further expand Israel’s incrimination and arrest of Palestinians.
“AI amplifies power,” an intelligence source who has closely followed the Israeli army’s development of language models in recent years explained. “It allows operations [utilizing] the data of far more people, enabling population control. This is not just about preventing shooting attacks. I can track human rights activists. I can monitor Palestinian construction in Area C [of the West Bank]. I have more tools to know what every person in the West Bank is doing. When you hold so much data, you can direct it toward any purpose you choose.”
While the tool’s development predates the current war, our investigation reveals that, after October 7, Unit 8200 sought the assistance of Israeli citizens with expertise in the development of language models who were working at tech giants like Google, Meta, and Microsoft. With the mass mobilization of reservists at the start of Israel’s onslaught on Gaza, industry experts from the private sector began enlisting in the unit — bringing knowledge that was previously “accessible only to a very exclusive group of companies worldwide,” as one security source stated. (In response to our inquiries, Google stated that it has “employees who do reserve duty in various countries” and emphasized that the work they do in that context “is not connected to Google.” Meta and Microsoft declined to comment.)
A security camera seen overlooking the West Bank city of Hebron, January 15, 2013. (Nati Shohat/Flash90)
According to one source, Unit 8200’s chatbot has been trained on 100 billion words of Arabic obtained in part through Israel’s large-scale surveillance of Palestinians under the rule of its military — which experts warn constitutes a severe violation of Palestinian rights. “We are talking about highly personal information, taken from people who are not suspected of any crime, to train a tool that could later help establish suspicion,” Zach Campbell, a senior technology researcher at Human Rights Watch, told +972, Local Call, and the Guardian.
Nadim Nashif, director and founder of the Palestinian digital rights and advocacy group 7amleh, echoed these concerns. “Palestinians have become subjects in Israel’s laboratory to develop these techniques and weaponize AI, all for the purpose of maintaining [an] apartheid and occupation regime where these technologies are being used to dominate a people, to control their lives. This is a grave and continuous violation of Palestinian digital rights, which are human rights.”
‘We’ll replace all intelligence officers with AI agents’
The Israeli army’s efforts to develop its own LLM were first acknowledged publicly by Chaked Roger Joseph Sayedoff, an intelligence officer who presented himself as the project’s lead, in a little-noticed lecture last year. “We sought to create the largest dataset possible, collecting all the data the State of Israel has ever had in Arabic,” he explained during his presentation at the DefenseML conference in Tel Aviv. He added that the program is being trained on “psychotic amounts” of intelligence information.
According to Sayedoff, when ChatGPT’s LLM was first made available to the public in November 2022, the Israeli army set up a dedicated intelligence team to explore how generative AI could be adapted for military purposes. “We said, ‘Wow, now we’ll replace all intelligence officers with [AI] agents. Every five minutes, they’ll read all Israeli intelligence and predict who the next terrorist will be,’” Sayedoff said.
But the team was initially unable to make much progress. OpenAI, the company behind ChatGPT, rejected Unit 8200’s request for direct access to its LLM and refused to allow its integration into the unit’s internal, offline system. (The Israeli army has since made use of OpenAI’s language model, purchased via Microsoft Azure, as +972 and Local Call revealed in another recent investigation. OpenAI declined to comment for this story.)
And there was another problem, Sayedoff explained: existing language models could only process standard Arabic — used in formal communications, literature, and the media — not spoken dialects. Unit 8200 realized it would need to develop its own program, based, as Sayedoff said in his lecture, “on the dialects that hate us.”
Shadows of police CCTV cameras seen near Jaffa Gate in Jerusalem’s Old City, January 30, 2017. (Sebi Berens/Flash90)
The turning point came with the onset of the Gaza war in October 2023, when Unit 8200 began recruiting experts in language models from private tech companies as reservists. Ori Goshen, co-CEO and co-founder of the Israeli company AI21 Labs, which specializes in language models, confirmed that employees of his participated in the project during their reserve duty. “A security agency cannot work with a service like ChatGPT, so it needs to figure out how to run AI within an [internal] system that is not connected to other networks,” he explained.
According to Goshen, the benefits LLMs provide to intelligence agencies could include the ability to rapidly process information and generate lists of “suspects” for arrest. But for him, the key is their ability to retrieve data scattered across multiple sources. Rather than using “primitive search tools,” officers could simply “ask questions and get answers” from a chatbot — which, for instance, would be able to tell you whether two people had ever met, or instantly determine whether a person had ever committed a particular act.
Goshen conceded, however, that blind reliance on these tools could lead to mistakes. “These are probabilistic models — you give them a prompt or a question, and they generate something that looks like magic,” he explained. “But often, the answer makes no sense. We call this ‘hallucination.’”
Campbell, of Human Rights Watch, raised a similar concern. LLMs, he said, function like “guessing machines,” and their errors are inherent to the system. Moreover, the people using these tools are often not the ones who developed them, and research shows they tend to trust them more. “Ultimately, these guesses could be used to incriminate people,” he said.
Previousinvestigations by +972 and Local Call into the Israeli army’s use of AI-based targeting systems to facilitate its bombing of Gaza have highlighted the operational flaws inherent to such tools. For example, the army has used a program known as Lavender to generate a “kill list” of tens of thousands of Palestinians, whom the AI incriminated because they displayed characteristics that it had been taught to associate with membership of a militant group.
The army then bombed many of these individuals — usually while at home with their families — even though the program was known to have an error rate of 10 percent. According to sources, human oversight of the assassination process served merely as a “rubber stamp,” and soldiers treated Lavender’s outputs “as if it were a human decision.”
Palestinians cross Qalandiya checkpoint on their way from the West Bank to the fourth Friday prayer of Ramadan in Al-Aqsa Mosque, April 29, 2022. (Oren Ziv)
‘Sometimes it’s just a division commander who wants 100 arrests per month’
The development of a ChatGPT-style tool trained on spoken Arabic represents a further expansion of Israel’s surveillance apparatus in the occupied territories, which has long been highly intrusive. More than a decade ago, soldiers who served in Unit 8200 testified that they had monitored civilians with no connection to militant groups in order to obtain information that could be used to blackmail them — for example, regarding financial hardship, their sexual orientation, or a serious illness affecting them or a family member. The former soldiers also admitted to tracking political activists.
Alongside developing its own LLM, Unit 8200 already utilizes smaller language models that allow for the classification of information, transcription and translation of conversations from spoken Arabic to Hebrew, and efficient keyword searches. These tools make intelligence material more immediately accessible, particularly to the army’s Judea and Samaria (West Bank) Division. According to two sources, the smaller models enable the army to sift through surveillance material and identify Palestinians expressing anger at the occupation or a desire to attack Israeli soldiers or settlers.
One source described a language model currently in use that scans data and identifies Palestinians using words that indicate “troublemaking.” The source added that the army has used language models to predict who might throw stones at soldiers during operations to “demonstrate presence” — when soldiers raid a town or village in the West Bank and go door to door, storming into every house on a particular street to conduct arrests and intimidate residents.
Intelligence sources stated that the use of these language models alongside large-scale surveillance in the occupied territories has deepened Israel’s control over the Palestinian population and significantly increased the frequency of arrests. Commanders can access raw intelligence translated into Hebrew — without needing to rely on Unit 8200’s language centers to provide the material, or knowing Arabic themselves — and select “suspects” for arrest from an ever-growing list in every Palestinian locality. “Sometimes it’s just a division commander who wants 100 arrests per month in his area,” one source said.
Unlike the smaller models already in use, however, the large model currently in development is being trained with Unit 8200’s dataset of millions of conversations between Palestinians. “Spoken Arabic is data that is [hardly] available on the internet,” the source explained. “There are no transcripts of conversations or WhatsApp chats online. It doesn’t exist in the quantity needed to train such a model.”
An Israeli military watchtower and cameras over Road 60, occupied West Bank, Jan. 30, 2006. (Activestills)
For training the LLM, everyday conversations between Palestinians that have no immediate intelligence value still serve an essential purpose. “If someone calls another person [on the phone] and tells them to come outside because they’re waiting for them outside the school — that’s just a casual conversation, it’s not interesting,” a security source explained. “But for a model like this, it’s gold, because it provides more and more data to train on.”
Unit 8200 is not the only national intelligence agency attempting to develop generative AI tools; the CIA has developed a tool similar to ChatGPT to analyze open-source information, and intelligence agencies in the UK are also developing their own LLMs. However, former British and American security officials told +972, Local Call, and the Guardian that Israel’s intelligence community is taking greater risks than its American or British counterparts when it comes to integrating AI systems into intelligence analysis.
Brianna Rosen, a former White House security official and current researcher in military and security studies at the University of Oxford, explained that an intelligence analyst using a tool like ChatGPT would potentially be able to “detect threats humans might miss, even before they arise.” However, it also “risks drawing false connections and faulty conclusions. Mistakes are going to be made, and some of those mistakes may have very serious consequences.”
Israeli intelligence sources emphasized that in the West Bank, the most pressing issue is not necessarily the accuracy of these models, but rather the vast scope of arrests they enable. The lists of “suspects” are constantly growing, as massive amounts of information are continuously collected and rapidly processed using AI.
Several sources stated that a vague or general “suspicion” is often enough to justify placing Palestinians in administrative detention — an extendable prison sentence of six months without charge or trial, on the basis of undisclosed “evidence.” In an environment where surveillance of Palestinians is so extensive and the threshold for arrest is so low, they said, the addition of new AI-based tools will enhance Israel’s ability to find incriminating information on many more people.
The IDF Spokesperson did not address specific questions posed by +972, Local Call, and the Guardian “due to the sensitive nature of the information,” asserting only that “any use of technological tools is done through a rigorous process led by professionals, in order to ensure maximum accuracy of the intelligence information.”
Yuval Abraham is a journalist and filmmaker based in Jerusalem.
We use cookies to optimize our website and our service.
Functional cookies
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Subscription
The technical storage or access is required to create subscriber profiles to send correspondence, or to track the user on a website or across several websites for similar purposes.
We use cookies to optimize our website and our service.
Functional cookies
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Subscription
The technical storage or access is required to create subscriber profiles to send correspondence, or to track the user on a website or across several websites for similar purposes.