EPISODE 4 Microservices Using Spring Boot and Spring Cloud – Part 1
November 03, 2020 | 35 min 53 sec
Podcast Host – Vinayak Joglekar, CTO at Synerzip
Podcast Guest – Vaibhav Patil, Engineering Manager at Synerzip
Brief Summary
Our co-host Vinayak Joglekar, CTO at Synerzip, continues talking to our guest speaker, Vaibhav Patil, about building microservices applications using spring boot and spring cloud.
Podcast Transcript
Madhura Gaikwad
Hello everyone. I’m Madhura Gaikwad, and you’re listening to Zipradio podcasts powered by Synerzip. In today’s episode, we are continuing our discussion from the previous episode on microservices using spring boot and spring cloud. Vinayak Joglekar, our CTO is in conversation with microservices expert, Vaibhav Patil. So let’s continue with this session. Welcome onboard guys.
Vinayak Joglekar (00:32):
Thanks Madhura. Thanks for the introduction. And this is the second part of the episode on microservices using spring cloud. Thank you for being on this podcast or the second consecutive episode. So for our listeners, I just recall that in the last episode, we covered many aspects of spring boot and spring cloud, which included how configurations servers can be used, also how discovery happens. Then we talked a little bit about the services that are offered by Netflix’s library, such as districts and patterns such as Circuit breaker, and bulk heading. And then we also about using filters, using zuul in the end, towards the end of the last episode. And there was, Vaibhav, we left out something there that is a very good use of zuul service to very often we launch a new product in which you might not want to expose a new feature or functionality to everybody. And a new version, maybe a form of a service, maybe version 2 is having the same functionality, but an improved version should be exposed to, let’s say to start with X percent of the population and then gradually increase this percentage or the load and reduce this percentage of people who are seeing the old service. Now, this is very useful because if you notice that new version of the service is not being liked, you’re not getting the same response or the same conversion rates, you might roll it back, and you might decide to go backwards from, let’s say 30% of the users seeing the new service to 20% or 10%, and then just shut it down. Or if you are getting a response which is better than your existing service, then you might decide to roll forward by increasing 30%, 40 50 and so on and finally to 100% and then shut down the old service.
So this is a typical application that you can use a zuul filter for. Can you just explain how zuul can be used for building, incorporating such a functionality in your application?
Vaibhav Patil (03:05):
Sure. So this is the other example of implementing microservice. So where we have to implement kind of a special route service and the job of that special route service is it accepts the service name as input parameter, and it has its own database, where the information about the special routes for this particular service. It’s vague, and whether it is accurate or not, that information is maintained in its own database. Whenever zuul gets the request for service A, then developer has to implement dynamic root filter in the API gateway. The job of that is to call these special root service using this service ID. So the response of that service is basically the new root that we have to forward so the width of that route. So based on the width, our dynamic filter can decide that what percentage I want to send requests to the new root. So we can implement that using by creating a random number and which will be compared with the weightage that is already set in the database. And with this combination, we can either forward to the new route, or we can keep using a distinct root. The only thing that here is if at all we want to navigate those special roots, the developer has to write it’s implementation. Zuul does not directly forward to the new root. So that part of implementation developers responsibility to implement.
Vinayak Joglekar (04:25):
So can you explain why Zuul is not forwarding? I mean, just the way, because it’s primarily the objective of using zuul is using a single point of entry from where you’re automatically directed to the needed service, right? I mean, so why in this case the developer has to take care of the re-routing?
Vaibhav Patil (04:48):
Yes, Zuul internally uses the mechanism, which is based on the incoming request, but in this particular scenario, we are trying to change the request, maybe additional url parameters or maybe our end url itself. So that part is not handled in the distinct Zuul limitations. That is the reason that the developer has to implement his own logic to call the special route. And then in that case, it has to handle all the orders idea and all that stuff. Developers have to handle on their own.
Vinayak Joglekar(05:21):
Now, you also said that there is a place in the configuration database where the weight is stored. And that weight would be a fixed number, such as, let’s say 30% or 0.3. Now I have to increase this gradually as a function of time. And I don’t want to every now and then enter this value in the database and change it from 30 to 40 and 30 to 20. I mean, I might just say that look, if left to itself, it should bump itself up by 5%, every two hours or something like that. So is it possible to write such a function, configuration function? Or is it mandatory to store a static value in the database?
Vaibhav Patil (06:02):
Well, a special route service being an independent service will have its own logic. If at all, it can be configurable that data can be stored in the MySql database or whatever database it is using, that data can be stored itself. And we, if you want, we can write a scheduler doc to keep on updating, keep on searching the database and keep on updating what is the weight. d
Vinayak Joglekar (06:27):
Now I might want to just use the system time to calculate the number of hours from the, Let’s say I have the database in which I have 30 as the timestamp. I use the system time and find it when it’s 4 hours and bump up by 5% to 35%. Isn’t that possible? I mean what I’m saying is some logic, possible to be incorporated in the service or redirecting thing.
Vaibhav Patil (06:53):
It is possible, but ideally the job of that service is to provide the clients that what is supposed to do for that particular route. So in my opinion, I would, instead of using the same service to update something in the database, I would rather have some other clients, you know, modify existing routes, which are stored in the special routes service database instead of having that logic implemented. Because if suppose later, after a month, if I want to change that logic, then the only way to do that is modify special route service and do that logic. So instead of doing that, I would prefer having some other clients work with special routes.
Vinayak Joglekar (07:35):
Makes sense. Yeah, because in that case I don’t have to touch the special routes database. So, thanks for completing this discussion on Zuul. Now, you know, very often we are faced with a problem, which is very unique to the world of microservices. This is the problem of scaling. We always say that microservices are used for scaling. And very often it is a problem because there is a lot of database interaction that we notice in the applications, which is causing slowness because taking a database connection or committing a right or update a right operation is time-consuming. And that is one of the reasons for the slowness in applications. So for scaling, one of the primary concerns we have, it has to be stateless because if you have some state stored inside the service, then you can’t just replicate that service because then it will not function the same way when it is used for two different requests. So for every request a stateless kind of microservice can be scaled across, by creating multiple instances of that service. Whereas if it’s a stateful service, that is hard because in that case, it becomes more difficult to scale because it holds a certain state, and it will work the same way when you see two different requests. So how is this problem solved? Or what is the architectural pattern that we use to solve this problem of scaling horizontally?
Vaibhav Patil (09:24):
Yes. So as you correctly mentioned, microservices are meant to be stateless. If at any point of time, we feel that we have to implement stateful service then microservices is not an option for that. Coming back to your question about how to scale such operations, there is a pattern called as CQRS, command query responsibility segregation pattern, is implemented along with event sourcing. The pattern is basically the names of this command query segregation. We can separate out the command operations, which means create update, delete operations on particular store. We can separate that from the query operations. So let’s take a hypothetical example. So there is a pattern called CQRS, which is what we have to implement along with event sourcing. The benefit of implementing this is that you can scale your application. And basically you enable the application to be forward compatible. Let’s take a hypothetical example or Facebook application where any user updates his profile picture. But if we closely observe that the update is not immediately affected by the other users who are using Facebook, so that picture may not be readily available to the other users.
So in that case, we’ve made possible that there is one centralized database maintained by Facebook and the service, which updates the profile picture. It will just update it’s own database, but different dependent systems. So in that case, that application’s responsibility is to publish a message and dependent systems, which are required to update this particular profile. They will consume those messages and update their old systems. Accordingly. Now CQRS pattern is as the name suggests, command, query, responsibility, and segregation. In this case, we separate out the command operations, which are create, update and delete. Those operations are separated from query operations. The idea here is that load on command operations could be different than load on query operations. So in this particular case, if you’re talking about creating a profile picture, it happens once in a while. But there are several requests being made or the query service for fetching that profile picture. So the scaling of the query service would be different than scaling of command service. So that is the CQRS pattern. And it is often implemented using event sourcing.
Vinayak Joglekar (11:54):
Yeah so let me repeat my understanding. What you’re saying the read and write operations are scaled separately. So I have a service which will be only reading and that can be stateless because reading repetitively, if I fire the same operation100 times, it will return the same result 100 times. And there is no change of state that would happen. And these things can scale independently. But with the crud operations, they’re just not the same. So what you’re saying that is that instead of having the same database being used for read, and write, you have two different databases, right? So you’re writing to a database which is being used by the command service, which is doing the current operations, but whereas you’re reading from another database, which is not really consistent. So going back to cap theorem , you are actually sacrificing consistency, right? In the interest of availability so that you are not blocked your read operations. Let’s say there are hundreds of your friends who want to see your picture on Facebook. And at the same minute you are updating your picture on Facebook. So your hundred threads are not blocked till you finish the upload operation to see that picture, right? I mean, they will still keep seeing your old picture and, uh, you know, uh, and when your new picture is available, eventually, it will become consistent right? So maybe zone wise or area wise these updates would be carried into other databases. So that’s what you mean by the CQRS pattern or the event sourcing pattern, right?
Vaibhav Patil (13:44):
That is correct. Usually the query service hardly interacts with any relational database or any municipal database. So the implementation of query services, usually it should interact with some cache database store for faster performance, because if there are millions of requests for that particular service, then it doesn’t make sense to have so many requests to any physical storage. Instead it makes sense to have that data replicated in the cache store. So that is where the CQRS pattern comes into picture, when you create or update anything in the database and you send a notification that whatever dependence systems are available on this particular message, you take care of your own. So all those dependent systems get that notification and its their responsibility to make updates into their own system, into their own cache. That is how it is implemented.
Vinayak Joglekar (14:36):
You mentioned that, you know, you might not want to use relational database because you don’t need the full functionality of all the sql updates and deletes. And all you are doing is for the given key or ID. You want to retrieve some data. So, you know, would it make sense not to use a relational database for the read operations or for the query operations? Instead of that you might want to use a cache store such as Redis, right? So that you are able to retrieve the content based on the ID instead of firing a query, which means is again, you have, let’s say my SQL, uh, as your source and, uh, on one hand, and on the other hand, you also have reddis, so two different technologies or two different databases. So my question is then this also can be extended further where you might want to use, let’s say a graph database, like Neo4j while dealing with graph data and maybe your columnar database for your column data and so on and so forth. differently, like for documents you use MongoDB. So which means that opens up an immense number of possibilities that you have a microservice, and depending on the type of microservice using a different type of database. Now in this picture, how will you make sure that the data or an event that happens is consistently recorded in all the databases? Because there has to be something that codes this together, right? Otherwise, each database will go in its own direction and they will not be consistent with each other. So how do you make sure that they are consistent with each other?
Vaibhav Patil (16:34):
There are certain message broker implementations, Kafka and Rabbitmq are a couple of examples. They have message brokers. And it’s the responsibility of these message brokers, to get the message from producers and make sure that the message is delivered to the appropriate consumer. So there will be various patterns to implement this message system. We can implement publisher subscriber implementation where there will be one publisher and multiple subscriptions accessing the same event, like in this particular case, if I’m doing updates to some database and it is an event on which other systems are dependent, that producer can produce a message that this particular record is updated, now you take care of your service. So there will be different consumers. One consumer could be the query service which will make updates in the Redis cache. There will be another service, which will make a updates to its own database. It could be any MongoDB or any database. It could be anything. So this is one way, other is point-to-point communication, which is the typical Cubist communication. So Kafka and RabbitMQ are well known examples of message brokers.
Vinayak Joglekar (17:51):
So what you’re saying is that your events are published to a message brokers and they are consumed by different databases who are subscribing to those events and based on, uh, let’s say whatever, be the application. So let’s say coming back to our example, of uploading your Facebook picture, that could be an event that gets published to the Kafka queue, and that has been consumed by a service let us say that is creating a thumbnail that they use on other things like your Facebook news feed, your thumbnail is used instead of the complete profile picture. So there is a separate microservice which is creating a thumbnail, which is subscribing to the same event, which takes this new photo of yours and starts creating the thumbnail. At the same time, there may be another service which may be publishing this fact that you have changed your profile picture to all your friends and creating the newsfeed based on that. So that would be another service that is subscribing to the same event. So you’ll have one event and multiple subscribers, and each subscriber is likely to take its own course to update its own data in that case.
Vaibhav Patil (19:03):
One subscription could be the analytics running on that particular event. Like for example, how many times this user has made updates to his profile and based on that provide some solutions. So analytics service would be one of the consumer of this event. Similarly, we can have any number of subscribers to that event.
Vinayak Joglekar (19:20):
Does this mean that no other data, the only way data will get into a database, is via a event or a message queue because then it starts making a lot more sense because then you can move your time needed forward or backward on the message queue to create a state of your application at any time. Let’s say I received a complaint from a user and that happens, and there are hundreds of users at the same time, it’s a dynamic system. S to get the state of your system which was existent when the user faced the problem, you can move a meter backwards in your event queue to arrive at that state or move forward gradually to see when that particular event actually occurred. And that error was seen by the user. So this makes a start making a lot of sense. So my question was, is the event source, the only source, uh, is there any other source from which data can get into a database?
Vaibhav Patil (20:25):
The other option is all these dependent services. They can expose APIs, which can be called from this command service. But if we want to go ahead with this implementation, then this command service will have to be tightly coupled with all those services. And in case later on there is another service which is being introduced, then that change will also have to be done in the command service to interact with that service. So it becomes a tight coupling. So that is not what microservices does. Microservices should be distributed and they should be loosely coupled from each other. So that is why message brokers come into picture where we can scale the application and we can make that application tolerant.
Vinayak Joglekar (21:06):
Right. Yeah. So this is very interesting. This also reminds me of an example, which since you mentioned fault tolerance of pets and cattles. So pets are the read or the command so where there is state, and you will not lose your state by killing the pet, right? And cattles are those which are the read type of, so let’s say you have containers in various modes. When it comes to being for high availability, I would pay a higher cost for, let’s say the pets, which are using higher end multiple zones and backups and all the good things that are available. Whereas I might go for a very cheap, uh, kind of a service for the cattle that is for all those that are doing the read operation. So it does not matter if one of those containers simply dies in the middle. Right? Because there is no state in it. So can you talk a little bit about fault tolerance and how it is managed with this pattern of having pets and cattles.
Vaibhav Patil (22:21):
So in the same example, suppose service is doing some command operation and on that event, there other systems dependent on that event? So it is the responsibility of the broker who preserved that event until it is consumed by the service. So even if the end service is not available, we will be very sure that whenever there are services available, that service is going to get that event. And that event is going to get processed by that service. The only thing is that for that, we have to make our broker highly available. That is the only point so that it will store, or it will keep on storing or whatever events are coming into it. I mean, if you’re talking about Kafka, RabbitMQ, they are good message brokers, they ensure that the message is delivered to the consumer. So that way we can make sure that all the dependent systems have the same data.
Vinayak Joglekar (23:13):
I agree. Yeah. That’s a good point. I mean, you’re in terms of making things available, you have higher availability of the message broker to make sure that unless the data is consumed by everybody, the data doesn’t go away. So it’s as good as a database by itself, right? So it brings us to our next topic. So you have, uh, events which are being logged, right? I mean, so essentially this is a running log, of all events. So to monitor what is going on, let’s say, there are a hundred microservices which are doing more or less the same thing, right? And then there are next 50 microservices that are doing the next thing. Now, finally, something goes wrong, the first step or the second step, you don’t know which step went wrong. And even if you know what step went wrong, you don’t know, which one of the a hundred microservices in the first step or what combination of the first and the second step resulted in the fault. So how do you trace, I mean, this logging and tracing becomes much more complex than a single threaded kind of application in which only one thread is running. Now here you have multiple threads running in parallel and there are handoffs between the threads. How do you manage that? So can you just explain a little bit more about logging and how the logging systems work for enabling and debugging and and monitoring.
Vaibhav Patil (24:51):
So when we implement microservice, it comes with its own complexities. And as you correctly mentioned, debugging any problem becomes cumbersome because if you talk about containerized microservices then, for one particular microservice we may have 10 different containers and containers, usually in livestream goes live, goes down. So this keeps happening. So when any particular container goes down, we lose the log information because everything resides in that container. So to handle this first thing is, we need to build a centralized mechanism to collect logs from all the containers that are running in the microservices ecosystem, and store them in a centralized system, which can be distributed internally, but a centralized system to log all that data and a visualization system to go and look into the database and brings up the data, which makes sense, whichever data the user wants to see. So log aggregation and visualization is one part. But the other point that you brought up, how will developers find out that this problem is caused at this particular state. To handle that, there is a concept of correlation ID. So whenever there is a request made by any client, the microservice system or microservice architecture, it should assign one unique identifier to every transition, which is related to that request. So the correlation ID, it has to be propagated to the all subsequent service calls. And it has to be sent back to the user. That is the correlation ID implementation. So if we have a centralized logging system, and if we have a correct visualization tool, so what the developer has to do is, he or she has to search all the related logs using the correlation ID. And we the help of the ID, they get all the information, what all the transition happening across services. To do that there could be a manual way to implement this, where we havea tool to implement a feature, which will inject the correlation ID in the variable and then keep on propogating that using interceptors and to the all subsequent services, and then passing that ID back to the client, using the response interceptor. So this is the manual way, but the spring cloud team has come up with the implementation. So they have a library. The developer has to add dependency of into the service and it will automatically inject the coded or testing information. And it will make sure that that information is passed along with all subsequent services. It also has one advantage that if suppose a developer has to..
Vinayak Joglekar:
So what is this information other than correlation ID?
Vaibhav Patil:
Trace ID is what correlation ID. It is unique for the entire transaction. Then it adds scan ID. So for example, there is a chain of microservices and in a particular service there can be multiple loggers statements. So all logging statements in that one microservice, will be assigned with one span ID. That is one thing. Then we also have the application name, it also injects the application name in logging, then report parameter it adds is whether the data is being sent to Zipkin server or not.
Vinayak Joglekar:
What is Zipkin?
Vaibhav Patil:
Zipkin is a tool or the implementation, which basically stores the data set by you. And we the help of that data developer can analyze performance of each service against the performance of the total transaction. So that is the benefit that we get using Zipkin. We were talking about hystrix in some regards..
Vinayak Joglekar:
There is a bulkhead pattern in which the handoff was happening, but all it was handing off was just the correlation ID. There was no concept of spans.
Right? I mean this concept of span is neat. Because then you have a total request end-to-end, right? And maybe you have hundreds of handoffs happening. So then you can break down those hundred into 10 logical spans, and then you will have only once one of those things, things are going wrong and you don’t have to look at all hundred.
Vaibhav Patil:
The other thing is with the hystrix, the developer has to specify time outs because few requests may have some additional time. So we need a mechanism which will tell us what is the average time taken by each service against the total transition times under normal conditions so that we can do the hystrix implementation accordingly across all services.
Vinayak Joglekar (30:02):
What you’re saying is you have slute, and you have Zipkin. These two things, the one which is enabling you to trace and then and then there is a tracing ID just like what have as a correlation ID. And then you can specify in addition to that it will contain time span or the timeouts, if you will, and you will be able to visually monitor these using isn’t that correct?
Vaibhav Patil (30:37):
The slute injects 4 things. One is the trace ID, which is unique across transactions, then scan ID, which is again, unique ID, but that is limited to whatever is happening in that service part is the application name itself. And port is whether that data is being sent to Zipkin or not. But timing of that particular service is captured by slute internally. So Zipkin has that information, but is not meant for log aggregation and not meant for visualization of logs. It is only meant for identifying the performance of each service against the total transaction. For log aggregation we need to incorporate separate systems. So there are ELK stock available, elastic search, Logstack, Kibana are available.
Vinayak Joglekar (31:25):
But if you have the ELK stack, where Kibana as I understand is for visualization, then do you need to use Zipkin?
Vaibhav Patil:
When I say Zipkin it is for identifying or performance,it keeps track of how the real transaction is happening, in real time. But with Kibana it is really hard to track what was the time taken by each service. Because it is just log store. The advantage of Kibana is that the logs whatever we have, are searchable.
Vinayak Joglekar:
But that is the work of elastic search right?
Vaibhav Patil:
Yes. Internally that data is stored in elastic search. Which Kibana interacts with and Kibana is just a visualization tool.
Vinayak Joglekar (32:09):
Yeah but then kibana you are saying is not a real time tool, it is more of like a historical data being analyzed at a later point in time. Whereas what you see in Zipkin is in real time?
Vaibhav Patil (32:16):
Yes in Zipkin you see how the performance of every transaction is. To make sure that we save every data to Zipkin, there is a configuration needed in each microservice that configuration we have, to make sure that, it depends on the developer that what is the percentage that he wants to send to Zipkin. Because it doesn’t make sense to send every log line to zipkin. In that case, that can reduce the percentage. But in the development stage, you can go on sending all logs to Zipkin. So that percentage is controlled by the service.
Vinayak Joglekar:
Yeah just like in the logging days we had that we could turn the logging on or off and detailed logging or not so detailed logging. So there are various levels.
Vaibhav Patil:
Yeah the log level is kind of the same idea but it is not dependant on the log level. So whatever logs are generated, even if we have a info log level, whatever logs that we generated based on the percentage, only few logs will be sent to Zipkin. And that percentage is configurable for each microservice.
Vinayak Joglekar:
Now, there are other systems which is we are not going to cover today like . You have a sidecar, sitting alongside your containers that is collecting real time data and then making it available for you to run analytics and monitoring that data. Another system is prometheus. So very soon I think Vaibhav, we will have another podcast, maybe in the next few months where we will be covering exciting microservices topics. So we also want to talk about service mesh using Istio and Envoy, and we will also be talking a little bit more on tracing and monitoring using Prometheus and things like that. So is there anything else that we needed to cover in this part?
Vaibhav Patil:
I think, uh, it is important that, you know, with log aggregation and visualization, if we want to go ahead with ELK stack. This is something which developers need to set up. But there are few cloud platforms available. Papertrail is one example, we have to sign up, and we have to pass all the docker logs in paper trail and Papertrail provides us the log aggregation and visualization tool. So that way developers are effectively using their time. So I think its good to have that at least during the development phase.
Vinayak Joglekar:
All right. Thanks. It was a pleasure having you for these two episodes of podcast on spring cloud and spring boot microservices, and we look forward to the next episode coming up on the same topic, talking about more exciting new horizons on microservices. Thanks.
Madhura Gaikwad (35:27):
Thanks, Vinayak and thank you Vaibhav for sharing your insights with our listeners and thank you everyone for tuning into this episode. If you are looking to accelerate your product roadmap, visit our website, www.synerzip.com for more information. Stay tuned to future Zipradio episodes for more expert insights on technology and agile trends. Thank you.