What is the Titan graphics database

System message

[TOC]

Version information of the graphics database

Diagram databaseexecutionRemarks
Neo4J3.2
OrientDB2.2.x
ArangoDB 、3.1.19There is a key invalidation problem that is preventing the server from downloading successfully
titanium1.0.0Cluster required, do not parse

Operating system and library information

  • OS : Ubuntu 16.04
  • VM VM12
  • Python3 driver
    • python-arango
    • neo4j-driver
    • PyOrient
  • Action gallery: MatPlotLib + Numpy
  • Performance monitoring library: psutil

Test information

  • The four images obtained from the test are
    • Volume timing diagram: the smaller the slope, the better the performance
    • Average CPU usage graph
    • RAM usage graph
    • Hard disk storage card

Graph database classification

NoSQL database category:

  • Key value database
  • Document-oriented database
  • Column Store Database (Wide Column Store / Column Family)
  • Graphics database (oriented)

  • The graphics database engine is all open, automatic drawing

Ten thousand knots and one hundred thousand insertion speeds

Insert ten thousand vertices V.

Ten Thousand Nodes - Insert Node Performance Analysis.jpg

Easy analysis

  • The insertion time of the three chart databases is almost the same and the performance is okay. OrientDB> Neo4J> ArangoDB
    • ArangoDB's node hash can decrease the performance of inserted nodes as the number of nodes increases
  • The CPU usage is that the Neo4J usage is higher than that of OrentDB, ArangoDB shows an improvement at the end and, in combination with the first diagram ArangoDB at the end of the slope increase, speculates that ArangoDB can insert the node slope that increases with the number of Knot decreases. This is because when storing nodes, ArangoDB calculates Hash performance is degraded, but the Y-axis of the node insertion speed is not of the same order of magnitude as the Y-axis calculated later. ArangoDB sacrifices the performance of the inserted node to improve subsequent performance. It is worth it ..
  • For RAM usage, select ArangoDB> OrientDB> Neo4J
  • Disk usage, OrientDB> Neo4J> ArangoDB

finally

In this step of inserting a node:

  • ArangoDB creates a hash index so that inserting a node will perform slightly, use memory the most, and use the least amount of space.
  • Neo4J does CPU intensive calculations and the RAM and disk usage rates are not high.
  • Inserting OrientDB takes the shortest time, and CPU and RAM take less time. It has the advantage of creating an OS tree index, but it consumes a lot of hard disk space. It saves data directly in the form of documents without preprocessing. Although both ArangoDB and ArangoDB support document databases and chart data, OrientDB focuses on storing document data rather than analyzing chart data.

Insert one hundred thousand side E.

Ten Thousand Nodes - Insert Edge Performance Analysis.png

Easy analysis

  • On the insert page, the duration is in the correct order: OrientDB> ArangoDB> Neo4J. From this we can see that without proper preprocessing on the node connection when inserting the relationship, OrientDB will perform well behind ArangoDB and Neo4J.
  • With CPU utilization Neo4J> OrientDB> ArangoDB
  • Disk usage: Neo4J> OrientDB> ArangoDB

finally

  • When inserting the edge, the ArangDB performance is still slightly behind Neo4J, but the CPU, RAM and disk usage are not as good as ArangoDB
  • OrientDB is far behind in performance and the amount of space remaining on the hard drive fluctuates greatly when it is inserted. A cache file or the like may be generated.
  • The OrientDB performance gap when inserting E is too big

  • The neighborhood depth is 1
  • Uses the latitude algorithm built into the three chart databases

10,000 knot crossing

Ten Thousand Node Neighbor Node Query Performance Analysis.jpg

analysis

  • The power that ArangoDB consumes while storing nodes brings tremendous performance benefits when performing chart calculations. Convenience and the shortest route both benefit. The convenience time is in the correct order: OrientDB> Neo4J> ArangoDB
    • Neo4J's graph is a dashed line. If there is no neighboring node, Neo4J can be found quickly. OrientDB and ArangoDB are approaching a straight line.
    • Neo4J nodes have a large number of relationships and their run-time consumption is large. The more relationships, the longer the time. The diagram is not suitable for large relationships.
  • Neo4J and ArangoDB are the two with the better performance in iterating graphs. Neo4J relies on the processing power of the CPU to iterate through the graph, while ArangoDB relies on memory
  • Neither Neo4J nor ArangoDB take up space due to traversal, but OrientDB does.
    • The disk space consumed by OrientDB during the run can be in preparation for optimization of the following chart algorithm, which can be seen from the shortest path chart algorithm inference.
    • Use storage space to optimize the algorithm speed.
  • OrientDB's throughput is too low

  • Use of the Dijkstra algorithm built into the three diagram databases

The shortest route from 10,000 knots

Ten Thousand Nodes - Two Nodes Shortest Path Performance Analysis.jpg

analysis

  • Time spent looking for the shortest path between two nodes: Neo4J> OrientDB> ArangoDB
    • Neo4J is still a polyline. By combining two graph algorithms, Neo4J can quickly find unrelated isolated nodes. Because of the index-free neighborhood.
    • OrientDB's performance is much higher than Neo4J's in the shortest path.
  • CPU usage: Neo4J> OrientDB> ArangoDB and OrientDB is very stable
  • If you take the shortest path except for ArangoDB, the other two will take up space.

Neo4J, OrientDB, and ArangoDB create indexes by default when inserting data. Part of the performance gap is caused by choosing their own indexes, and their respective concepts are different.

  • Neo4J: Index-Free Neighborhood, good for traversing graphs and calculating graphs of nodes that don't have a large number of relationships
  • OrientDB: Focus on the document database mainly caused by the SB tree index and a lot of space will be wasted. The insert node is almost the same as the other two databases. However, the two other databases are optimized in terms of their insertion relationship. OrientDB is not optimized. Hung. It has excellent performance in terms of graph theory computational power, but is not yet sufficiently optimized when iterated through and is thrown away.
  • ArangoDB: Indexed V and E documents,, To ensure the speed of the internal search for document data

What "Graph First" Means for Native Graph Technology

An overview of the graph database space.png

[Oreilly Graph Databases] (../ Neo4J / docs / Oreilly Graph Databases.pdf) Figure 1.3

There are two main elements that distinguish native graph technology: storage and processing. —— native-vs-non-native-graph-technology

In particular, it is said that native is better than non-native.


Briefly explain

SurnameArangoDBOrientDBNeo4J
Database typemulti-model DBMSmulti-model DBMSgraph database
Data modelDocument store 、 Graph DBMS 、 Key-value storeDocument store 、 Graph DBMS 、 Key-value storeGraph DBMS
Suitable operating systemLinux 、 OS X 、 Raspbian 、 Solaris 、 WindowsAll OS with a Java JDK (> = JDK 6)Linux 、 OS X 、 Solaris 、 Windows
supportACIDACIDACID
Unknown keyNoYesYes
  • ArangoDB and OrientDB are both a combination of document and diagram databases. V and E are stored in different documents, and then the diagrams are made through the documents. The difference is:
    • ArangoDB saves diagrams in the document database mode, which are identified by special fields (,) The document type becomes the V and E documents as the basis for the diagram database.
    • OrientDB uses OOP inheritance to implement classes V and E.
  • ArangoDB is optimized The field can be a very simple hash and has been optimized.
  • Neo4J's storage method is to save V and E as two files.

ArangoDB Enterprise Edition has a SmartGraph feature that has not yet been tried.

Graphics database storage data type - the relationship between complexity and flexibility:

Graphics database storage data type - the relationship between complexity and flexibility.png

ArangoDB

Benefit: ArangoDB FAQ 、

  • Under Disk Space Usage: Metadata mode is used to store data. Wiki Metadata
    • Can accelerate by memory, low CPU usage
  • Supports master-slave clusters
  • Multi-collection transactions
  • Good scalability: JavaScript;
    • Foxx Microservices use JavaScript and ArangoDB to build applications and run on the database. You can access data quickly.
  • The AQL function is very powerful, the configuration programming is far more convenient and flexible than Neo4J, OrientDB
    • Neo4J's Cypher is also relatively powerful and clear, but not adaptable and not flexible enough
    • OrientDB, SQL-like, complicated query, inconvenient customization and inconvenient built-in SQL function interface.

Disadvantage:

  • The insertion rate is a little lower
  • ArangoDB doesn't compete with massively distributed systems like Cassandra with thousands of nodes and many terabytes of data. ArangoDB FAQ

Cassandra: Used to store simple format data such as Inbox.Wiki

The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.

- Apache Cassandra

index

  • Automatic indexingAttributes, withproperties to ensure the speed of V and E searches. ArangoDB default index

Neo4J

advantage

  • Cluster ready, use the read / write load balancer to direct requests to a cluster; [Oreilly Graph Databases] (../Neo4J/docs/Oreilly Graph Databases.pdf), Figure 4.9
  • Supports things, locks, page cache; [Oreilly Graph Databases] (../Neo4J/docs/Oreilly Graph Databases.pdf), Figure 6.3
  • Under traversal: indexing usually costs O (log (n)), but Neo4J's complexity in traversing a relationship tends to be O (1); [Oreilly Graph Databases] (../Neo4J/docs/Oreilly Graph Databases.pdf) Page: 151
    • Index-free neighborhood: Offer high-performance passes, queries, and writes
    • Neo4j uses relationships, not indexes, for fast traversals ; O (1)
    • ArangoDB wrote an article: Index Free Adjacency or Hybrid Indexes for Graph Databases, About the technology that kills itself;
    • The storage node uses "index-free adjacency", ie each node has a pointer to its neighboring node, which means that we can find the neighboring node in O (1) time.
    • The best storage mode for graphics relationships, embedded, powerful, light
  • Cypher is grammatically friendly

disadvantage

  • Neo4j cannot save a large graph because it does not support sharding
  • Because of the index-free neighborhood, the traversal is fast, but the performance in computing the shortest path of two random nodes is not good

Sharding is the method used by MongoDB to divide large collections into different servers (or a cluster). Although sharding comes from relational database partitions, it (like most aspects of MongoDB) is a completely different matter.

—— What is broken glass?

File storage

  • Storage Relationship Record Array Data:
    • relationships are stored in the relationship store file,.
    • Storage Relationship ID:
  • Storage relationship group data and its sequence ID:
    • Array data for storage relationship groups
  • Storage relationship type and its sequence ID:
    • Store relationship-type array data
  • Storage relationship type name and sequence ID:
    • Storage relationship type token array data

OrientDB

advantage

  • Easy installation and extensive functions
  • OrientDB is a deeply expandable document graphics database management system (NoSQL database) that combines the flexibility of a document database and the linking functions of a graphics database management system.
  • Optional no mode, full mode, or mixed mode. Supports many advanced features such as ACID transactions, fast indexing, native and SQL query functions.
  • You can import and export documents in JSON format.
  • Without performing expensive JOINs, a relational database can retrieve hundreds of linked document diagrams in a matter of milliseconds.

disadvantage

Storage principle

OrientDB principle for local storage: With a hard disk cache that contains hard disk data that is divided by parts (pages) of fixed size and written in a logging method (when changes to pages are recorded for the first time in what is known as persistent storage), the following functions can be achieved: OrientDB 2.2.x —— PLocal Engine

  • Operations on single page are atomic.
  • Changes applied to the page can be restored after server crash even if they were not flushed to the disk.

Protect data

Standard index

SB index, based on B-tree.SB tree

  • Disk usage is not difficult to understand.
  • The unique identifier @RID of its node is the superordinate node identifier of the SB index tree, speculate.
  • The insert relationship gets the node first, which is slower compared to the implementation of the SB tree index and Neo4J and ArangoDB.