Performance issues during November 2023

Between Thursday 16th and Monday 20th, we received multiple reports of performance issues with Reflect desktop and mobile apps. Here is an update and with simple steps on how to resolve them.

Performance issues during November 2023
TLDR: If you use Reflect's desktop app, and haven’t quit it in the last week, please be sure to quit Reflect and reopen it. That way you won't encounter a serious bug we recently dealt with.

Timeframe and symptoms

Between Thursday 16th and Monday 20th, we received multiple reports of performance issues with Reflect desktop and mobile apps.
On desktop, the app worked slower than usual, sometimes froze, and in more serious cases didn't load at all.
On mobile, the app kept reloading itself - it would show the spinner over and over again. That usually means the app crashed in the background, and the system was trying to restart it.
On both platforms, there were issues with syncing, where changes wouldn't propagate to other devices.

It seemed something serious was going on. Since we started Reflect over 2 years ago, our unbreakable principle has been that we put reliability first.
After noticing the issue, we dropped everything and jumped on figuring out the problem.

Affected users

So far it seems this issue affected about 100 users. It's not a precise figure. Our telemetry can't really capture problems such as slowed down UI or freezing.
On mobile, things are a bit more precise. There we saw about 30 users experiencing cyclic reloading.
At this point I want to thank everyone who reached out to us over e-mail or Discord. And especially to everyone who took time out of their days to do a debugging call with us.
In situations like this, our end-to-end encryption can be inconvenient. The only way to figure out what happened is to actually do a screen share, and look around your app.

So what happened?

When an app starts acting out, we usually look for any changes that we've made recently. Did we deploy anything that could've caused any issues?
The safest solution is often to roll back a deploy, and switch to the last version that worked fine.
In this case it was different. None of our recent changes could've caused the slowdown our users were experiencing.
And yet it seemed like the app was getting worse and worse for these affected users.

Applying commits

We started asking our users to do debugging calls with us. As they showed us around, we watched the logs through Logflare. In one of the first calls we did, we saw something suspicious:
notion image
Reflect is using YJS for conflict resolution, and after users make a few changes, they are packaged into a commit and sent to the server.
When they open Reflect on another device, they receive all new commits, and after that, their notes are up to date.
In the second message above, something strange is happening. User is receiving multiple commits, but each next commit is a little bigger than before.
 
  • Commit 43 - 776 changes
  • Commit 44 - 786 changes
  • Commit 45 - 800 changes
  • etc.
 
Every programmer knows to be alarmed whenever they see their app sending more and more data each time it talks to the server.

How we clear local changes

The rule number one of Reflect is never lose any data. With our syncing algorithm, we are purposefully careful especially around erasing local changes.
Here's how saving a note works:
  1. When app opens, we start listening to new commits in a note
  1. After user stops typing, we create a new commit and send it to Firebase
  1. Even if the commit has finished saving, we don't consider that a success just yet
  1. We wait for Firebase to notify us of this new commit. Only then we delete the local changes.
 
With this approach we were making an assumption: If saving is working, then listening to new changes must be working as well.

Firebase outage

At this point we were pretty worried. We couldn’t pin the bug down to a deployment, none of the timing made sense.
And then we saw it. Through analyzing our logging, we saw a spike in crashes exactly when Firebase had a database outage on Nov 16th.
First, a little context. Firebase has a realtime database product called Firestore. Subscribing to changes in Firestore looks something like this:
firestore.onSnapshot(query, (snapshot) => {
// Update note from changes
})
When you run this code, it'll call your callback whenever a change is made. It should handle loss of connectivity - even if you go offline for a bit, coming back online should resume notifying. (And include all the changes you missed while you were offline.)
In this case, it seems that because of the Nov 16th Firebase outage, the snapshot listener got corrupted. It hadn't been designed for outages and had no idea that it was offline. It just stopped getting new updates. We stopped getting notified of new commits being created.
This theory is supported by timeframes aligning perfectly. Issues with Reflect started appearing right after the outage ended.
So new commits were getting saved successfully, but the changes weren't coming in. The just-saved commits weren't confirmed by Firebase.
And if the commits aren't confirmed by Firebase, then they aren't erased from the user's device. The result is an ever growing array of local commits. Each time the user stops typing, we save that array again and again, larger each time.

Solution

If Firebase's onSnapshot stopped calling our callback, how can we make it start again? Luckily, all that's necessary is to reestablish the connection. This can be done by simply reloading Reflect.
We had a few debugging calls where we confirmed this approach worked.
If your Reflect app has been running for a few days, we suggest restarting it now.
The only problem is that our users didn't know they had to reload the app. So for some of them, the commits kept growing and growing.

Cleanup

Lots of large commits is never a good thing. Some of our users experienced slowdown or even freezing due to this. On mobile, it caused reloading of the app, as it ran out of memory in an attempt to apply these commits.
What can be done if users can't even open the app?
Let's come back to the example of bad commits:
  • Commit 43 - 776 changes
  • Commit 44 - 786 changes
  • Commit 45 - 800 changes
  • Etc.
Each next commit is a bit larger than before. It grows larger, because it contains all changes from the previous commit + the new changes on top.
If each next commit contains the previous changes, we can safely say the last commit of the series contains all of the changes.
To fix these bricked apps, we ran a script that empties all the commits in the growing series, except for the last one. (While making a backup.)
We carefully ran this on a few users' graphs, and they confirmed it fixed the problems and didn't erase any data. We are now rolling this change out to all affected users.

Next steps

Our overly careful syncing algorithm has its flaws, and this bug exposed them magnificently. We'll revisit it, and figure out how to improve it.
We'll also add more robust handling when Firestore has an outage.
Lastly, we've added a mechanism to reflect that lets us prompt you to reload the app. This prompt will now appear whenever there's a new app update.
notion image
I just want to say a huge thanks to our patient customer base. We understand how important your notes are to your day. For us, reliability always comes first ahead of new features. We will do better.
One last note, if you are having any issues with Reflect, please don't suffer in silence. Reach out to us either on Discord or via our support email address. We pride ourselves in having lightning fast support with technical people who always will know the answer to your question.
 

Written by

Alex MacCaw
Alex MacCaw

Founder and CEO of Reflect