@Danstan - we manage thousands of workbooks and since sessions are per-workbook-ID we have to open as many as the number of unique workbooks we have to update. If you can trace requests by app ID - 4d42f407-c4ef-49bb-ac28-e14c32f843ff (or I can send you a log of about 500 most recent requests half of which fail with this error).
The error seems incredibly random and is still happening now.
Simplified flow (tenant authorizes offline access)
All requests are in try/catch
- Get client access token from refresh token if needed
- Get Drive info
- Get or create folders (app stores data in specific folders)
- Upload pdf/png/jpg file(s) (app always uploads several files) and get uploaded file IDs
- Check if Excel file exists (tenants can have multiple files that need to be updated)
- IF file exists -> Get {File ID} ELSE Upload a new Excel file -> get {File ID} in response
- Create a workbook session for {File ID}
- IF session created -> add {Session ID} to headers for subsequent requests ELSE proceed without Session ID
- Update Excel file : Add a row to a sheet
- IF session created -> close session {Session ID}
Note that this error does not prevent us from actually updating the workbook, UNLESS the workbook is opened in a browser by a user, in which case we will get a "workbook is locked" error in step #9 (and we then retry at a later time). This however, poses a significant problem since many of our clients want to see "real time" updates in their Excel spreadsheets, which is ONLY possible if the session is created properly.
Transient underlying/inner error would signify workload not being available, however it ONLY affects the sessions "endpoint" - file upload, folder create, excel update - all of those operations perform flawlessly. But in the course of a time period, we can have successful session create followed by session create errors completely randomly: e.g. we can have 20 successful sessions followed by 5 errors, or a sequence of alternating success-errors. It does not seem to be affected by number of requests: e.g. 5 requests per minute have pretty much the same % of failures as 5 requests per second. All fail with this specific error.
Another interesting point (apart from a weird message for REST) is that this error seems to happen randomly on existing and newly created files , if we had just uploaded a blank file (step #6) there could not be any other session on it except for the one we're trying to create, and we explicitly close open sessions. Most files are small (and new ones are "empty" - no rows). We tried "pausing" up to 5 seconds after uploading a new file before creating a session on it, but that made no difference - some would error, and some would succeed.
Thanks for looking into this.