Interview with Cetin Karakus, Global Head of Quantitative and Analytical Solutions, BP
You have been in the commodity trading business for quite some time, after spending time working in financial markets. What is the most drastic changes you have seen in the oil/energy business in recent years?
I guess there are few themes that are overarching. They may not just be specific to the energy markets but are nevertheless changing them in significant ways:
First is the financialization. Energy markets are primarily physical markets driven by the underlying supply-demand dynamics of the energy commodities involved. However, there are many derivatives and structured products designed and traded by the market participants that tailor and fine tune the exact risk profile each market participant is comfortable with and is willing to take. This process can be dated back to the late 90s and the infamous Enron and we see a trend of expansion of financial instruments available both in terms of variety (products/markets, risk structures) and sophistication. Moreover, one can see the acceptance and active of use those instruments by the market players that make up the global energy value chain. This is thanks to the real societal benefits those instruments provide in terms of risk management and operational viability of certain crucial businesses (e.g. power utilities, airline companies, oil refineries, etc.) that are backbone of the modern world we live in.
Second is automation. Oil and gas industry relies on colossal amount of physical infrastructure (wells, pipelines, refineries, shipping fleets, gas stations, etc.) to operate. Safe and efficient operations do not just affect the bottom line of the energy companies which have to operate such large amount of infrastructure assets on a daily basis. They directly determine whether those companies are viable businesses and have license to operate. There are many examples of operational mishaps either bankrupting or almost bankrupting the energy companies which let their eyes off the ball. Automation technologies can help with improving the processes involved in the safe operations of critical assets both in terms of cutting costs and increasing the quality of the work carried out. For example, regular safety inspection of LNG tankers used to be a very manually intensive process where human experts have to examine each individual cell of the interiors of a basketball hall size box that is a LNG tanker. Thanks to the drone technology armed with high definition cameras and data analytics systems with sophisticated computer vision algorithms, we can optimize the manual efforts to examine the cells that look suspicious instead of going through each one of them manually. The whole process is quicker, cheaper and safer. That summarizes the automation.
Third is digitization. Pretty much every energy company is going through a “digital transformation” these days. The scope of these transformations may vary from company to company but it is fair to say they try to focus on making the energy companies financially leaner and operationally more efficient to adapt to the new normal of the energy transition with low margin business dynamics, heightened societal concerns about the role of the energy sector in the climate emergency we are facing and potentially disruptive competitive forces that are gathering pace.
In order to face such challenges successfully, energy companies have to transform the way they operate in a root-and-branch fashion. This is not just about using the latest machine learning and cloud technologies to improve the way you operate. That would be a superficial, fad-chasing exercise with obviously limited benefits. It should be about reconfiguring your organizational structures and associated power and collaboration dynamics such that you can fully internalize the benefits of those so-called digital technologies and utilize them to the fullest extent of their capabilities. So, for example, cloud transformation should not just be lifting-and-shifting your enterprise applications to a cloud provider such AWS or Azure. You will not need the armies of people who used to maintain your data centers after your move to the cloud for instance. You can deploy such resources into other value-added activities just as, perhaps, continuing with the process of cloud transformation by optimizing your enterprise systems and applications to fully take advantage of the new cloud environment. You can get rid of expensive database systems and use free/open-source alternatives as we have been doing in BP for example.
Finally electrification. Perhaps I should call it “greenification”. We are at a critical juncture in terms of climate change and energy companies have huge a responsibility to make the energy transition possible and every company has a part to play in this colossal effort. The world experienced the first electrification wave at the end of 19th century and through the early part of 20th century.
That was mostly through lighting up cities and homes and later through industrial and consumer applications of electricity. We are at the verge of second wave that will primarily be driven by the mobility and transportation needs thanks to the incredible advances being made both in terms of clean (low carbon) generations of power and also the high density storage of such energy due to the innovations in battery technology. Coupled with advances made in the computing (IoT, ML, predictive analytics, etc.), we can cut our reliance on the fossil fuels and go through the energy transition period with the minimal sacrifices we all have to make in terms of the perks modern life we all got accustomed to in the second half 20th century.
On ETOT this October you’ll share about the BP US’s data journey. Can you share with us what have been your top 3 priorities while building and enhancing the data platforms?
I am currently in charge of a strategically important data analytics platform our North American Power and Gas trading business relies on for its physical operations and trading activities. Though I had experience with creating data analytics platform architectures and had a good frame of system design mindset prior to taking over this responsibility, I gained a chance to validate my long-held
architectural design principles within the backdrop of a large-scale, mission-critical, legacy (i.e. original developers long gone) system while on working on it and evolving it beyond the design context of 10 years earlier when it was first conceived.
My first priority was adopting the principle of evolution but not revolution. There was a huge hype of Hadoop by the time I took over running our data platform. The platform is based on a star-schema based data warehouse architecture on a Oracle database and is finely tuned for the complex queries we run to drive the sophisticated analytics we generate that our business stakeholders depend on for their day to day activities. Here I was hearing “brilliant ideas” to replace our data platform with a new system where we would store all our data on a regular file system based storage distributed across a Hadoop cluster. This would then imply, in order to run even some basic SQL-like queries, we would have to pull the data from across the Hadoop cluster, cache it in an in-memory server cluster and then be able to run those queries. This was a much slower — think of all the latency involved in collecting data from across the Hadoop cluster, cache it in the in-memory cluster and keep those two in sync — and more expensive to operate –RAM costs way more than hard disk storage per each byte — architecture than our existing architecture. Moreover, it was also an inferior one — we could not even be able to run complex queries — with lots of potential hazards — imagine the issues when the data on the disk and the data in the in-memory cache are out of the sync. Our existing platform relied on the rock solid relational database technology which has matured over 4 decades with constant improvements and innovations and was grounded on a sound theoretical foundation called relational calculus. I just could not see any reason to move away from all that to such an inferior, faulty and expensive platform.
Using a new technology just for the sake of using it without understanding its implications and comparative value proposition with respect to the existing, functioning technologies can only be described as profound ignorance and I had to waste some of my valuable time to deal with that.
I am really glad that Hadoop hype has finally died off since then and but this does not mean this kind of solutions-in-search-of-problems are all going to disappear. Sadly, they just move on the next target on the Gardner hype curve. I suggest you ask really hard questions next time someone is exuberantly excited about replacing a perfectly functional system with the next “cool technology” especially when that technology is riding the Gardner hype curve. This does not mean being a Luddite and be close to the technological ideas. Quite contrary, it actually means really understanding the technology and inevitable trade-offs inherent in every technological choices and having the courage to confront hype-driven propositions that will not add any value to your business and its technology estate.
My second priority was increasing scalability for staying relevant. I mentioned that our data analytics platform uses a star-schema based data warehouse data backend. Such a backend is perfect for typical drill-down uses of business intelligence tools and power the whole ecosystem of end user computing use cases (reports, custom data driven apps, etc.). Such use cases are very important for the day to day operations of our business and we cannot afford to discontinue the capabilities
our platform provide to enable those use cases. However, we have been noticing an emergence of a new strain of uses cases based on more advanced data analytics and machine learning workflows that consume data at both at a higher volume and higher frequency(i.e. faster).
It is like suddenly having to have to supply a new 8 cylinders, high-performance engine when we were used to supplying more pedestrian 5 cylinder one. In our case, trying to support algorithmically driven ML uses using our existing architecture optimized for incremental drill-down use cases would have created huge strains on our data backend and hence would have deteriorated the existing BI users while not really providing the high performance throughput expected by new ML users. Our solution was to evolve our system by adding a new tier of data aggregation and serving layer where we could pre-compute large datasets from our finely dimensionalized data in our data warehouse system. Since we could run the complex queries that generate the wide (many columns) and large (many rows) datasets once (instead of being hit every time someone needs that data) and store them to serve all ML users, we could ensure the scalability of our platform in the face of new and growing demand from ML processing pipelines. The only catch to this all-glossy story is we have to create an additional storage tier and store duplicate copies of the data so that we can serve it fast. As the storage is very cheap and we decided on a cloud-based free database system (Postgres) this is not big issue for us and has great benefit/cost profile.
My third priority was increasing cost-efficiency for adding value. We currently use a very expensive database system (Oracle) to run our platform. Though there are certain proprietary aspects of Oracle we take advantage of to run our system efficiently, we realized that there are not any features of Oracle we cannot live without if we decide to migrate our system to an open source, free alternative so long as we are willing to make some engineering changes to optimize our platform to take advantage of the substitute features and strengths of the chosen new platform.
So, we decided to move our data analytics platform from Oracle to Postgres (on AWS) after some careful consideration and diligent system migration and rollout plan. This will save us seven digit savings in Oracle licensing fees in medium term and our system is just one example, there are dozens (maybe hundreds) of other systems across our company that are likely to take the same journey. If you are going to consider such a move in your organization, here are my advice:
1-Set up an enterprise-wide program to help/support/coordinate all such efforts. This program will also be where all the acquired wisdom is collected and a knowledge base of issues and workarounds are captured. This will be very useful for second, third and subsequent waves of migration efforts.
2-Pick some competent and brave teams to trail-blaze the migration project. They will set up the beach head if I may opt to use a military terminology. The issues they come across and solutions they develop to address those issues will seed the knowledge base mentioned earlier and their collective sacrifices (hard word, hours of debugging, etc.) will help reduce the “casualties” you might otherwise experience in second, third and further waves of migration efforts
3-Though you should support the individual migration teams by means of providing development resources they need and ensuring they get the support they require from external parties (e.g. AWS engineering teams), you should empower them to be in charge of the migration of their specific projects (i.e. making ultimate decisions) and hold them accountable for the success of the migration.
4-Ensure that your migration teams adopt a “make-before-break” approach. The whole point of migration is to save costs and you are certainly not going to achieve that if you suddenly break your operational systems that your business relies on to make money. I have personally created, as the technical owner of our data platform, a detailed, fail-safe, multistage project plan where we would not switch over from the existing Oracle based system to the new Postgres based one without ensuring the successful operations of the latter. This kind of approach significantly de-risk otherwise potentially risky, no-thanks project.
Data analytics has become an essential part of the commodity trading business, what would be your advice to a company that is not yet very much engaged in that space? Where should they start? Where can they find guidance?
My advice to them: Get in the space now! In fact I would be very interested to know about their “secret” of dealing with the competition all of which I am sure are already “arming” themselves with data analytical capabilities. Inaction is not an option if you want to stay around.
I do not think starting out should be a challenge. Any successful data analytics effort starts with, you guessed it right, data and I am sure you already have multitude of OLTP (operational data like orders, shipments, trades, etc.) and OLAP (data warehouses, data marts, BI databases, etc.) data stores already. They are likely to store highly structured, valuable data you can seed your advanced data analytics efforts with.
Avoid the folly of thinking that buying a shiny new toy (product) will suddenly boost your capabilities in this space. It won’t. At best it damages your bottom line and at worst it will sap energy from your teams and make you focus on wrong things. You could easily be fooled by the sunk cost fallacy and try to make best of “your investment”. So, start first with investing in your people, not third-party products.
There is a ton of freely available, open-source products in data analytics space and
you will need a very strong case to justify paying for any proprietary software.
You will not certainly be qualified to make that judgement call if you are just starting out in the space. So stick to freely available, open source products initially.
Bring new talent to bootstrap your efforts if necessary. If you are going to use consulting services, prefer outcome based, fixed-price engagements.
Start small and grow steadily based on the early successes and lessons learned.
Focus on quality not quantity when hiring. Remember, one good developer could do more than 10 mediocre ones.
As for guidance, hire one or two capable individuals and empower and incentivize them appropriately to bootstrap your efforts. They can help with hiring the talent you need and redeploying the existing ones you already have. They can also steer you in the right direction in response to the trends building up in your industry and future proof your data analytics efforts in response to those strategic storms.
ETOT and DigiCom will take place as virtual events this year. What do you look forward to on these two events?
This is the first time I am going to participate in the event and I am excited about being able to get a chance to connect with other participants and exchange ideas with the participants coming from different backgrounds and experiences to mine.
Different perspectives and approaches you learn about in such engagements have always made me reshape some of my own thinking and hence helped me improve my professional work. I can see a distinguished list of participants in both events with a very broad range of experience and backgrounds, so I am particularly excited to engage with them to learn about new things.