JSON 문서 성능, 저장 및 검색: MongoDB와 PostgreSQL
hackernews
|
|
🔬 연구
#json
#mongodb
#postgresql
#review
#데이터베이스
#성능 비교
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
이 기사는 문서 지향 데이터베이스인 MongoDB와 관계형 데이터베이스인 PostgreSQL의 JSON 문서 처리 방식과 성능을 비교 분석합니다. 저자는 MongoDB의 유연한 스키마 설계와 PostgreSQL의 엄격한 데이터 무결성 규칙의 차이를 설명하며, PostgreSQL이 JSONB 지원을 통해 문서 데이터 저장 기능을 크게 강화했다는 점을 강조했습니다. 또한 Account와 Product라는 서로 다른 복잡도의 데이터 모델을 사용하여 두 데이터베이스의 성능, 저장소 효율성 및 검색 속도를 실제 테스트를 통해 비교할 계획임을 밝혔습니다.
본문
JSON Documents Performance, Storage and Search: MongoDB vs PostgreSQL Collections of Documents vs Tables of Rows Collections of Documents is an alternative approach of organizing data in databases. The most widespread and battle-proven way is the relational, SQL way - Tables of Rows. What is the difference? In SQL, we have tables containing individual rows. Tables have strict schemas that every row must obey; there are columns with types and other possible constraints: unique, not null, value checks or references to rows of other tables. Referential integrity lies at the heart of this data approach - guarantee that if row B1 of table B references row A1 of table A, referred row (A1) must exist; orphan rows are not allowed. If we want to delete A1 row, there are two options: - delete B1 first so that A1 is not referenced anywhere - have A1 delete cascade to B1, automatically deleting it as well Tables of Rows in SQL are therefore focused on explicit schema, enforced types, constraints, validation and relationships between tables - openly defined and carefully guarded. Collections of Documents on the other hand, offer a much more relaxed approach. Collections are just namespaces where we insert documents. Documents are objects of any schema and format; but in practice, it almost always is JSON. There are no enforced types, no constraints, no guarded references between documents in different collections. In the same collection, we might have documents of completely different schema - flexibility and openness to any data and column types rules here. In tables, rows have columns of simple, scalar types (mostly) - numbers, ids, strings, dates, timestamps and so on. In collections, documents have fields comprising both simple and composite types like arrays and other documents, nested inside. Same field in different documents, but still of the same collection, might have different types as well - almost anything is allowed here. Why all this context, when our main goal is simply to compare the level of JSON documents support in Mongo, the Documenter, to Postgres, the Elephant? Well, MongoDB was designed and created as a document database first and foremost, not an SQL one (NoSQL). It is focused and optimized for this particular use case and a way of storing and accessing data. PostgreSQL on the other hand, is a relational, SQL database that later on added support for composite column types like JSON/JSONB, ARRAY and others. Over the years, it has extended and optimized storing JSON documents in its own binary JSONB format, as well as added more ways to index, query and modify data of this type. Let's then dive in and see for JSON Documents Performance, Storage and Search: Does MongoDB still have an edge as a document-oriented database for JSON in particular? Or is Postgres better? Or at least good-enough to stick with it, since it is a more universal database, offering a richer feature set and wider applicability? Performance For numbers-first audience, the summarized results are here. Setup To test performance from multiple angles, we will operate on two different collections with the following schema: record Account(UUID id, String name, String type, List owners, Instant createdAt, Instant updatedAt, long version) {} record Product(UUID id, String name, String description, List categories, List tags, List variations, List relatedProducts, Instant createdAt, Instant updatedAt, long version) { record Variation(String type, String value) {} } They are defined in both databases as: // MongoDB db.createCollection("accounts"); // _id field is always indexed by default; // 1 means ascending order db.accounts.createIndex( { createdAt: 1 }, { name: "accounts_created_at_idx" } ); db.accounts.createIndex( { owners: 1 }, { name: "accounts_owners_idx" } ); db.createCollection("products"); db.products.createIndex( { name: 1 }, { name: "products_name_unique_idx", unique: true } ); db.products.createIndex( { categories: 1 }, { name: "products_categories_idx" } ); db.products.createIndex( { tags: 1 }, { name: "products_tags_idx" } ); db.products.createIndex( { createdAt: 1 }, { name: "products_created_at_idx" } ); // PostgreSQL CREATE TABLE accounts (data JSONB NOT NULL); CREATE UNIQUE INDEX accounts_id ON accounts ((data->>'id')); CREATE INDEX accounts_created_at_idx ON accounts ((data->>'createdAt')); CREATE INDEX accounts_owners_idx ON accounts USING GIN ((data->'owners')); CREATE TABLE products (data JSONB NOT NULL); CREATE UNIQUE INDEX products_id ON products ((data->>'id')); CREATE UNIQUE INDEX products_name_unique_idx ON products ((data->>'name')); CREATE INDEX products_categories_idx ON products USING GIN ((data->'categories')); CREATE INDEX products_tags_idx ON products USING GIN ((data->'tags')); CREATE INDEX products_created_at_idx ON products ((data->>'createdAt')); Documents of products collection are intentionally designed to be more complex and larger than accounts - I want to see what happens, what is the perform
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유