Methods - Charting the Ottoman Empire

Methodology

Our project follows a structured, iterative workflow that integrates Ottoman historical research with digital humanities and data science. The initial stages—decoding, transliterating, translating, and summarizing the corpus—have laid the foundation for structuring and analyzing the data. Below, we outline the key steps of our methodology:

1. Decoding and Transliteration

The first step in our workflow involves deciphering and transliterating Ottoman fiscal and administrative documents, which often contain shorthand, abbreviations, and specialized terminology. Our team carefully converts the Persian-Arabic script into a Latin alphabet format, ensuring consistency across documents. This process requires deep expertise in paleography, codicology, and archival research to accurately interpret handwritten variations, marginal notes, and numerical systems.

This transliteration is essential for making the texts readable, searchable, and analyzable, paving the way for further processing, including data extraction and AI-assisted translation.

Figure 1. An excerpt from Codex MAD 9726 with identified and numbered decrees, followed by transliteration.

To enhance efficiency, we employ supervised machine-learning models for Named Entity Recognition (NER), which help classify people, places, institutions, and key financial terms within the documents. This automated approach supports large-scale text analysis while maintaining accuracy through human validation.

2. Translation and Summarization

Once the documents are transliterated, they are translated and summarized to make their contents accessible to a broader audience. Given the complexity of Ottoman administrative language, which often includes formulaic expressions, legal jargon, and archaic terminology, we use a hybrid approach:

Figure 2. Decrees 704-708 from Codex MAD 9726.

Expert human translation ensures accuracy, particularly for ambiguous or context-dependent phrases.
AI-assisted translation tools help speed up the process, with outputs carefully reviewed and refined by historians.

Summaries are then created to highlight key information such as transaction details, financial records, and political decisions recorded in the documents. These summaries serve as structured data points, allowing us to link and analyze patterns across different sources.

3. Data Modeling and Structuring

After translation, the extracted data undergoes a rigorous modeling and structuring process to ensure consistency and usability. Each entry is enriched with metadata, including:

Dates, recorded in both the Ottoman fiscal calendar and their Gregorian equivalents.
Geographical references, linked to historical maps and modern coordinates.
Named entities, such as individuals, institutions, and administrative units.
Financial transactions, documenting revenues, expenses, and broader economic trends.

Figure 3. Conceptual map for data modeling for the Codex MAD 9726.

Figure 4. Decrees 704-708 from Codex MAD 9726.

Figure 5. Database for analysis of Codex MAD 9726.

Figure 6. Data structuring for the Codex MAD 9726. Here, unstructured and structured data for two decrees are shown in the database.

To efficiently manage this structured dataset, we are developing a relational database in SQL, which allows for complex queries, cross-referencing, and visualization of financial and administrative records. This database serves as the foundation for network analysis and statistical modeling of the Ottoman fiscal system.

4. SQL Querying and Visualization

Once structured, the data is stored in our relational SQL database, enabling:

Searchable text archives for exploring transliterated and translated documents.
Interactive tables that allow users to dynamically analyze financial and administrative data.
Network visualizations, illustrating relationships between individuals, institutions, and transactions.
Interconnected datasets, linking documents based on people, locations, and financial activities.

Figure 7. This graph shows the percentage of debt or receivable settled by each method.Debt assignment indicated in red, was the most common method. However, inheritance and property transactions, often involving the deceased Pasha's family, were individually handled.

This stage ensures that the data is accessible, reusable, and preserved for long-term research. The database is designed to capture the hierarchical and relational nature of Ottoman financial records, facilitating collaborative research and advanced analysis.

5. Network Analysis

A key innovation of this project is the application of digital humanities tools to construct network visualizations that reveal the complex relationships within the Ottoman fiscal system. We transfer data from MySQL to Gephi to build interactive network graphs, which illustrate:

Social and administrative connections among key figures in the Ottoman financial bureaucracy.
Economic flows and transaction patterns across different regions and institutions.

Figure 8. Graph of transactions and actors: orange nodes show 'passive' (debt) transactions, purple nodes show 'active' (asset) transactions, and grey nodes represent actors. Node size reflects transaction amounts; edge colors indicate actor roles.

To enhance our analysis, we also integrate the data into a Neo4j graph database, enabling dynamic exploration of relationships between individuals, assets, and state transactions. This approach provides a deeper understanding of the structural and functional dynamics of Ottoman financial networks.

6. Interpretation and Research Applications

The final step in this phase involves interpreting the structured data within broader historical contexts. Our interdisciplinary approach—combining Ottoman history, economic history, and digital humanities—enables new insights into governance, taxation, and social structures. This stage also sets the foundation for future expansions, including the integration of additional archival sources and collaborative research initiatives.