Highlight-AnzeigenIhr Inserat auch immer ganz oben!? [Info]
| Head of IT Infrastruktur (m/w/d) | von Schleifring GmbH | |
| Frontend Entwickler | von Agentur Anmut GmbH | |
| Besser als WordPress ohne Plugins – ab €200 | von arnego2 LTD | |
| Informatiker/in / ITSM Prozessmanager/in Change und Release [...] | von enercity Netz GmbH |
Beschreibung des Angebotes: Art: Freelance
Bewerber
Biete: Architect of Storage Solutions (all genders)
von Julia Baumgarten in Berlin | Firma „Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.“ | Freelance
Wikidata is Wikimedia’s knowledge graph, which acts as a backbone for Wikipedia and other Wikimedia projects. It is also a significant part of the Linked Open Data network, and through being publicly available allows anyone to access its data. Wikimedia Germany is looking for a storage solution consultant to help us evolve how Wikidata stores and provides its data in the decades to come.
Wikidata is a wiki that everyone can edit, either manually on wiki pages or programmatically. Data from Wikidata’s graph can be accessed in a number of ways: through various web APIs, dedicated SPARQL API and querying UI, or data snapshots (data dumps) provided periodically.
Project details
- Seniority: senior level – Starting Date: Mid February 2026 – Duration: 4-6 weeks / we’re not planning to hire on a permanent basis – Hours per week: 30-40 hours / week – Location: Germany / remote
Scope
We are seeking an experienced architect of storage solutions on a freelance contract to liaise with the product development, SRE and platform teams that support the existing system in order to analyze and ideate potential approaches for data storage of Wikidata to support its strategic goals and growth in the period 2026-2035.
Technical background information
Primary data storage used by Wikidata is a Mediawiki relational database (Maria DB) storing data objects in the form of string representations of JSON objects. Several secondary storage approaches have been introduced optimized for particular use cases
- Dedicated SQL table dedicated to labels (“titles”) – Dedicated SQL tables storing “links” between different elements of the knowledge graph – Elastic Search index for search – RDF Triplestore that enables SPARQL querying
Wikidata primary data storage in numbers (state Jan 2026)
- Database size: 1.2 TB (900 GB in Jan 2025) – Average rows read rate: 1.98 M read rows/second – Average rows written rate: 5K written rows/second
Examples of known limitations and risks of the current storage approach
- The SQL tables storing data of all versions, and their relevant metadata, of Wikidata data objects has been growing too big to be efficiently served: For every version of a data object there is an entry stored permanently in the respective table. – The table approaches the size which would no longer fit in memory of SQL servers. – Wikipedias and other systems using data from Wikidata observe significant increase of database writes when data in Wikidata changes.
Required Skillset
- You’ve extensive experience in designing storage solutions for high performant web applications and other systems. – You’ve designed and maintained relational databases, graph databases, and NoSQL storage (e.g. document stores). – You’re fluent in open source storage technology. – You’re available to focus on Wikidata storage analysis as a primary task for a period of several weeks.
Interested?
Please send us your application documents via our job portal. We’d like to hear from you what makes you interested in working with us on this project, and to describe your experience in designing and implementing the storage for high-availability data-heavy websites or web applications, with links if possible.
We kindly ask you to refrain from application photos and information on date of birth, marital status and parents. Wikimedia Germany is committed to equal opportunities and does not discriminate on the basis of, for example, ethnic origin, citizenship, religion or belief, political or other convictions, gender, age, disability, or sexual identity. We would like to address you in the way which feels most comfortable for you so please share your preferred name and pronouns if you wish.