Who uses SockJS in production
node.js, mongodb, redis, with Ubuntu performance drop in production, RAM is free, CPU 100%
As the question title suggests, I have a hard time figuring out what can be improved (or tweaked in the Ubuntu operating system) about my application to achieve acceptable performance. But first I explain the architecture:
I store persistent data (like users and scores) on a second server on Mongodb (3.6) with 4 GB of RAM and 2 cores.
The app has been in production for a few months (it ran on a single box until a few weeks ago) and is used by around 18,000 users a day. It has always worked very well except for one major issue: degradation in performance. As it is used, the amount of CPU used by each process grows until the worker is statured (which no longer meets requirements). I temporarily solved it by checking the CPU used by every employee every minute and restarting when it hits 98%. So the problem here is mainly the CPU and not the memory. Memory is no longer an issue as I upgraded to socket.io 0.9.14 (the earlier version lost memory) so I doubt it's a memory leak problem, especially as the CPU is now growing pretty fast ( I have to restart each worker about 10-12 times a day!). The RAM used is also growing, to be honest. but very slow, 1 gig every 2-3 days, and the weird thing is that it doesn't release even if I completely restart the whole application. It is only released when I restart the server! I can't really understand that ...
I've now discovered Nodefly, which is amazing, so I can finally see what's going on on my production server and I've been collecting data for a few days. If someone wants to see the diagrams I can give them access, but basically I can see that I have between 80 and 200 simultaneous connections! I was expecting node.js to handle thousands, not hundreds, of requests. The average response time for http traffic is also between 500 and 1500 milliseconds, which is a lot in my opinion. At this moment, with 1300 users online, this is the output of "ss -s":
This shows that I have a lot of closed connections in Timewait. I increased the maximum number of open files to 999999. Here is the output from ulimit -a:
So I thought the problem might be with the http traffic, which is saturating the available ports / sockets (?) For some reasons, but one thing doesn't make sense to me: why when I restart the workers and all clients again within a few seconds establish a connection? The worker's CPU load drops to 1% and can handle requests properly until saturated after about 1 hour (at peak time).
Hopefully there is something obvious I'm doing wrong and someone will help figure it out ... Feel free to ask me for more information and I'm sorry for the length of the question but I think it was necessary. .. Thanks in advance!
After a few days of intense trial and error, I am happy to say that I understood where the bottleneck was and I will publish it here so that other people can benefit from my insights.
The problem lies in the Pub / Sub connections I used with socket.io, and in particular in the RedisStore which socket.io uses for cross-process communication of socket instances.
After realizing that I could easily implement my own version of pub / sub using redis, I decided to give it a try and removed the redisStore from socket.io, keeping the default storage (I don't have to send to all connected Clients, but only between 2 different users who may be connected via different processes)
Initially I only declared 2 global Redis connections x to handle the pub / sub on each connected client, and the application used less resources, but I was still affected by constant growth in CPU usage so not much has changed would have. But then I decided to create two new connections to re-create Redis for each client so that their Pub / Sub is only managed in their sessions, and then close the connections once the user has disconnected. Then after a day in production the CPUs were still at 0-5% ... bingo! No process restarts, no errors, with the performance I expected. Now I can say that node.js rocks and I'm glad I chose it for creating this app.
Fortunately, redis was designed to handle many simultaneous connections (unlike Mongo). By default, it's set to 10,000, leaving room for about 5,000 concurrent users on a single Redis instance, which is fine for me for now, but I've read that it can be rolled out on up to 64,000 concurrent connections, so this architecture should be solid enough in my opinion.
At this point, I was thinking of implementing some sort of connection pool for Redis to tweak it a little further, but I'm not sure that this won't cause the Pub / Sub events to build up on the connections again unless each of them is destroyed and recreated every time to purify them.
Anyway, thank you for your answers and I am curious what you think and if you have any other suggestion.
No answer in itself as your question is more of a story than a question with an answer.
Just to say I successfully created a node.js server using socket.io that handles over 1 million persistent connections with an average message payload of 700 bytes.
The 1 Gbps NIC was initially overloaded and I saw a lot of I / O latency from publish events for all clients.
Removing Nginx from the proxy role had also returned valuable memory, as achieving one million persistent connections with just ONE server is a difficult task to optimize configurations, applications, and operating system parameters. Keep in mind that this is only possible with a lot of RAM (around 1 million websockets connections consume around 16 GB of RAM. With node.js, using sock.js is ideal for low memory consumption, but for now socket .io uses so much).
This link was my starting point to achieve this volume of connections with the node. Aside from being an Erlang app, all of the operating system optimization is pretty much application independent and should be used by anyone targeting a lot of persistent connections (websockets or long polling).
- How anti-Russian is NATO
- What is 4 6 1 3 7
- What is planned for Fellaini from Mourinho
- What is the oldest Indian community
- What is a browser extension
- The Swiss Coop is owned by the government
- What is jungler in League of Legends
- Was Jesus a better teacher than Confucius?
- What are the properties of the vacuum
- Epstein's body is subjected to an autopsy
- How can I overcome mental laziness
- How do languages evolve?
- Why does negative news dominate the headlines
- How did Johannes Gutenberg's invention help
- What is the value of pastoral care
- Has President Erdogan helped or hurt Turkey?
- Can fruit wine get you drunk?
- Reference is important when applying for YIF
- What were the main achievements of President Taft
- What video games have terrible sequels
- What is David Lynch's enneagram type
- Venmo is available in Pakistan
- Do you get into battles
- Little people can use big swords