Loading ...
Petabyte Scale Archiving Infrastructure
Documenos Archive Management Technical Architecture
1. Scalable Infrastructure:
- ASP.NET Core based development
- PostgreSQL (v15 and later) database compatibility
- Architecture suitable for horizontal and vertical expansion
- Cluster support
- Seamless access behind load balancer
- Each server can connect to the same database.
- Horizontal capacity can be expanded by increasing the number of servers.
2. Archive Tiering and Data Flow
With background services:
- Automatic upload from file system to archive
- Upload data to archive from FTP
- Transfer between archive layers
- Batch extract operations
- Physical cleaning of deleted data
- Services can be controlled on a server basis.
3. Content Processing Services
For archived content:
- Image processing (thumbnail, resizing, EXIF/IPTC extraction)
- PDF text indexing
- OCR tasks
- Speech → text conversion in video files (60 languages)
- Speech → text conversion in audio files (60 languages)
- Generated texts are saved to search indexes.
4. Data Model and Metadata Capacity
Support for 1.4 billion tables in dynamic datasets
For each table in dynamic datasets:
- 250 fields recommended usage
- 1600 fields technical upper limit
- ~4.2 billion rows data storage capacity
Datasets:
- Authorization-based visibility
- Excel upload / download
- Automatic data update with event trigger
5. Security and Access Control
- LDAP integration
- Built-in password system
- MFA (OTP, Email, SMS)
- IP-based access restrictions
- Group-based authorization
- Declaration that no backdoors are defined in the software