I would like to propose that you include the time that a request had to wait for when the queue was empty, before it could return with a message when one arrived in the queue, or the long poll timeout and it could return no messages for the response.
I have an algorithm that Rate/Impedance matches with the prediction, of the upstream and downstream consumption rates. Therefore, I run the right amount of threads to keep the pipelines running at optimal capacity, with no stalling. Accounting for bad latency.
The problem comes when using long polling, is that when there are no messages in the upstream queue, then the time that we wait for messages to land in the IBM Queue before a long poll request returns are included in the client application measure of how long the request took to complete as the request/ response is a black box to us.
When the queue is empty and the request has to wait before returning with a timeout or a delay message, it would be great if the queue manager side, could time how long it has to wait for messages to arrive before it responds with messages, when the queue is empty.
That would allow it to be subtracted from the Client application measure RTT, to always have the real RRT when messages are present in the queue and the performance of IBM Queue.
Typically, this results in us running even more accurate predictions.
If messages were to take longer, then result in us running more threads than would be necessary.
Request -> Server
---------------> Do we have a Messages?
---------------> If Yes Return immediately with LongPollWaitDelay/Stall = 0
Response <--
---------------> If No
---------------> Start a timer and wait for a maximum of the long poll for the messages to be retrieve. ( You already have a long poll timer for timeout, so just use it to return this value, in the response when messages arrives)
---------------> Messages Arrives in Queue = Yes
---------------> Stop the timer and add the waiting timing for a messages to land in the queue to the Response Payload with the Messages (LongPollWaitDelay/Stall = timeout timer value)
Response <-
---------------> Messages Never Arrived Timeout
Response <- Timeout (LongPollWaitDelay/Stall = irrelevant there it response with no_messages)
So I shall put this forth as a feature idea/feature request, as it would allow any application making use of the long poll concept, to achieve greater precession in rate matchers and predictions.
-------
When getting a message from the queue using long polling, where a long poll wait could be 1000ms, then
there is an issue when trying to measure the round trip time and messages retrieval time,
because it would include the long poll stalling time, while the get was waiting for messages to arrive at the server.
When writing a highly performant application that I have done, I compute window rates, to determine the number of threads based on the upstream and downstream consuming rates.
For this, I need to exclude the time that a GetMessage was waiting for a message to arrive in the queue when the queue is empty. As one wants messages the IBM current queue performance for retrieving messages.
Is there any metric that IBM some were returns, how long a GetMessage with MQGMO_WAIT flag has had to wait for a message to arrive? If the queue was full it returns with a value of 0ms, if it had to wait for 300ms before the first bit of messages could read from a queue before it could be pumped over the wire
Then the response head included with the messages would indicate that GetMessage stalled for 300ms waiting for a message to arrive.
This way I can have a timer around the GetMessages Request and I can subtract the LONGPOLLWAITSTALLEDMS from my timer, getting the true messages RTT
Retrieval time, without the stall.
It makes this difficult when messages load is sporadic, or one processes them quickly.
Hi Wesley,
You are correct, there is no MQI output field in the MQGMO that contains the time it took for a suitable message to be returned with MQGMO_WAIT.
I guess it would depend on what you are really trying to measure, for an application the cost of retrieving a message is the sum of the request being passed to the queue manager, a suitable message being found and copied back to the application via the bindings layer. I’m not quite sure how timing information on each phase would help an application figure out if enough servicing threads were started or not, which I think is ultimately what you are trying to achieve?
If you need to know if messages are being serviced quickly then you might find queue service interval events useful https://www.ibm.com/docs/en/ibm-mq/9.3?topic=events-queue-service-intervalsome or some of the queue status fields may be of use https://www.ibm.com/docs/en/ibm-mq/9.3?topic=monitoring-queues such as MSGAGE.
In summary, this type of performance analysis and optimization is not something that MQ would typically include in its messaging API.
When an MQGET is blocked waiting for a message, it will then return immediately to the application when a message appears on the queue that it can retrieve. So it's really unclear what you are looking for.
In any case, we do not think that extending timer information is something that we would prioritise for development.Therefore this item is being declined.
Good Day,
The reason is really to be able to calculate respectively speak, the number of threads one needs to have active, to ensure the downstream consumers of the pipelines of buffers and pool, don't stall. Statistical window things I have going, which has been working really well. Say on 15 years old hardware, I now doing like 15million messages a day, no issues.
As I had to re-work the code for newer data centers and cloud, were round-trip time and latency, which is like 10-20ms on the MQ side, which basically ment, we were not able to process the traffic load, when it was like 3-4million.
So the same code is now running on the old hardware. NO attempting to improve the auto calibration of things made things more generic and chainable as a library that others could just use.
Right now, what I have implemented is like 15% threshold, if the stat deviates by more than the window average of that drop it the sample, as includes long poll time..
Having the amount of time, that long poll stalled for return in the response, would make this alot simpler and go along way to optimall utalizing resources for autocalibration.