All PostsWebinar

Molecule Webinar: Sink or Swim – making data lakes a central pillar of your C/ETRM

todaySeptember 1, 2023 415

Background
PLEASE COMPLETE THIS FORM TO ACCESS PREMIUM content:
PLEASE COMPLETE THIS FORM TO ACCESS PREMIUM content:

Recent years have seen tremendous advances in advanced analytics, AI and machine learning, and many other forms of interpreting and actioning data to achieve trading advantage and process efficiencies.

However, the best data in the world and the most advanced analytics tools are almost useless without a strong and effective data management strategy in place, firmly integrated and understood throughout the organisation.

This important webinar brings together traders with technology and industry thought leaders to explore the latest best practises in data management. The panel gave us valuable insights into how data lakes work in partnership with your C/ETRM, allowing real-time answers to complex data queries and providing an absolute trading advantage.

Subjects to be discussed include:

  • The importance of data interoperability in a shifting energy market
  • Approaches to managing and analysing unstructured vs structured data
  • Best practices for optimising data accessibility across your trading organization
  • The key role your E/CTRM plays within your data ecosystem

Our expert speaker panel includes:

  • Timothy Kramer – Founder – CNIC Funds
  • Alex Whittaker – General Manager – Bonroy Petchem
  • Paul Kaisharis – SVP Software Engineering – Molecule
  • Kari Foster – VP – Molecule
  • Ryan Rogers – Principal – Enite
  • Ben Hillary – MD – Commodities People

Transcript

BEN HILLARY

Well, hello everyone, and welcome to today's webinar sync or swim, making Data Lakes a central pillar of your Cetrm. My name is BEN HILLARY, managing director of Commodities. People. And yeah, we'd really just like to say a huge thank you to everyone for being here with us. Really delighted to see how much this webinar has attracted the interest of the industry, with over 400 registrants from all corners of the globe and all parts of the commodities and data ecosystem. In the next 60 minutes, we'll be deep diving into the latest best practices in data management. We'll be exploring how Data Lakes can work in partnership with your Cetrm, allowing real time answers to really complex data queries with the goal of providing absolute trading advantage. Recent years have seen really, really incredible advances in advanced analytics, AI, machine learning, many other forms of interpreting and actioning data. However, all this is virtually useless without a strong and effective data management strategy in place, firmly integrated and understood throughout the organization. This is what we aim to shine light and provide best practice on. Today, we've got a truly expert speaker panel lined up to whom I'm very, very grateful for their time and input. Some of the subjects we'll be covering today include the importance of data interoperability in a shifting energy market, approaches to managing and analyzing unstructured versus structured data, best practices for optimizing data accessibility across your trading organization, and the key role your Ectrm plays within your data ecosystem. The webinar will take the format of a panel discussion followed by Q A. So on that note, throughout the webinar, please be posting your questions in the Q A box and upvoting. Others of interest also do make full use of the chat channel for any comments you want to share with the panel and the audience, or even just to say hello and introduce yourself. I am now delighted to pass over to Carrie Foster, VP for Molecule. Carrie, the floor is yours.

KARI FOSTER

Thank you so much, Ben, and it's great to be here. As Ben mentioned, I'm Carrie Foster. I'm the VP of Marketing at Molecule, which is the modern ETRM and CTRM platform. The ETRM and CTRM, if you're not familiar, is an energy trading or commodity trading risk management platform. I'm really excited to be here today, introducing just a stellar panel of experts who are representing trading risk management and technology. And this is really such an important topic for anyone within the trading organization who depends on data to do their jobs. Basically, don't we all. And data management strategy is so much more than just the technology you have in place, and certainly that's going to be covered today, but it's how the data is structured, how it's accessed and consumed, the ways that it can be analyzed. And that all starts with a strategy that has the end in mind. So without further ado, I'd love to get this really important discussion going by introducing today's panelists. First. Paul Kasharis is my colleague, the senior VP of engineering here at Molecule. Tim Kramer is the founder and CEO of CNIC Funds. And he actually used Molecule to model the prototype for their US carbon neutral power futures index. ETF Ryan Rogers is principal at Enight, which is a management consulting firm delivering strategic solutions to energy utilities and manufacturing. And Alex Whitaker is general manager at Global Energy Trading and supply Company. Bonroy Petco. And I'll pass things back over to you, Ben.

BEN HILLARY

Excellent. Thank you, Carrie. Right, drumroll. We will get kicked off right away to begin with, with a poll for the audience. So I am launching the first poll now and hopefully everyone can see this. So the question is, and it's a single choice, what is the biggest challenge with getting better insights from your trading data? Is it data quality, data management tools, data analytics tools, overall data strategy, internal skills or knowledge? Unsure or not applicable? So if everyone can ponder that one and I'll end the poll in about 10 seconds. Excellent. Okay, I am ending the poll now. Okay, so, interesting results. Alex. Tim, how do these results line up with your own experiences? From your perspectives? How do they align with the main challenges you.

ALEX WHITTAKER

I mean, I think I can relate to those answers like the data strategy and data quality being the main problems, but also situation, that just a bit of everything. My experience with this is just how fragmented all the different data sources are, the volume of data, all the different sources, all the different licenses that I need. How do I get all of my prices together, even for what is a relatively simple trading? Yeah, that's what I take from this, is just that, say, my own struggles at the start of getting everything set up for Bomroy, I think come through in that data quality and data strategy and just thinking, how do I get just basic prices in? I mean, let alone more complicated prices. That's what concerns me at the moment. Overall data quality.

TIMOTHY KRAMER

For what we're seeing, the overall quality and management haven't really been a problem because we're using exchange traded prices. And so those things have like an auto scrape and that just hasn't been an issue. And the skills and knowledge is I mean, the people that we're seeing right now that are working for us, plus the people that we interface with, just amazing. The younger generation and their math skills, stat skills of kind of what they know and how computer literate they are, it's just stunning. But the part that we see that's a little bit challenging still is the analytics on this. There's different math techniques and different things that people want to look at. Different people have like a different version of how they want to do a sharp ratio, things like that. And then when people try to use the data to get some more insights out of it and actually make something useful with it to try to get an edge. That's where there's just so much more information that you can tease out of the data, like cross correlation and Co integration and things like that. So trying to isolate the individual components once you have the data, that's kind of been what we see as the biggest challenge.

BEN HILLARY

Excellent, thank you. Well, next question. The trading landscape itself has changed immensely in recent years. We've got factors like the energy transition, the rise of carbon markets, general shifts in technology. What has been the impact on business needs from a data perspective? Ryan, if we could start with you on that question, then we'll go to Tim, Paul and Alex.

RYAN ROGERS

Certainly. So the traditional highly structured data remains critical. Things like market risk management, credit risk management, compliance, where I think we've seen new needs and new analytical capabilities needed, is in some of the renewables markets. There's a lot much larger networks for the Internet of things, a lot more smart grids and sensors bringing in semi structured data, things that are in JSON or XML. So there's more of a struggle to deal with that semi structured data in the traditional data warehouses. And then I think the emerging markets, carbon credits, environmental credits, some of those markets where they are on exchange traded platforms and have structured prices, it's great, but some of them are auctioned there's infrequent data points to pull in. So I think some of those emerging markets and the need for just monitoring newsfeeds market sentiment for some of those emerging markets would benefit from natural language processing, machine learning, and some of those emerging capabilities.

TIMOTHY KRAMER

So what we've kind of seen for how the landscape has changed has been sourcing and documentation of the data. So the data comes in, it's good, but everyone well, where'd you get that? Where did that come from? Well, what's the web link to that? Well, how can I verify that? So there's been a big push, and again, because we're registered with the SEC for what we do, there's a big push on the actual data, sourcing the documentation, so for auditing, for the sock, et cetera. And as Ryan said, then when you're taking a look at carbon, people want to be able to kind of verify things all the way back to the source and say, okay, does this qualify for SFDR Article 6789, whatever it would be? So that would kind of be this thing that we're seeing is the documentation and the actual it's all the way down so someone can look at the actual links and verify it.

BEN HILLARY

Excellent. Paul, your thoughts?

PAUL KAISHARIS

Yeah, so I guess what I would add, we've kind of touched on this already about the volume and the variety of sources of data. And just the volume of that data has increased tremendously over time. And it's really a big challenge to manage this volume and make sense of it and add value to the business. There's so much data coming from all different sources. And Tim touched on it. Just having that connection of where that data has come from and the relevance of know, I would say technology has matured quite a bit. And for businesses on how to use that technology and how to manage that volume of data and make appropriate use of that data, I think that's an impact of business to decide how to best utilize and leverage that. And then, of course, there's the larger question of AI, the evil AI, and what do we do with machine learning and large learning models? That's a real question businesses need to ask themselves on if their competitors are going to be using that technology, is that going to leave them behind? So I think there's some big questions around that for businesses to answer.

BEN HILLARY

Alex, your thoughts?

ALEX WHITTAKER

Yeah, echo what Paul just said, that really sort of impact. We've seen more data, more data vendors, more delivery methods, more choice, more complications, more costs, more service problems. And yeah, just the sheer growth in data, different sources that it comes from, I think it ties in quite well with this data lakes and your CTRM is what's a solution to that is. Trying to get that fragmentation to come together in one place with people who know what they're doing and how to do it and try to streamline it that way is something I've actually learned during the process of doing this panel, talking to these guys. So it's something I'm looking at at Bonroy right now, in fact.

BEN HILLARY

Next question, we'll go to Ryan and Paul with this one. What is the difference between unstructured semi structured and structured data?

RYAN ROGERS

So, structured data is the one we're all familiar with. All of the pricing, volume, transactional data contracts, that the sources are well known. How fresh it is is timestamped it's verified trade controls, person monitoring that data daily. The semi structured is probably the next most useful bucket of data that I've seen at my clients. So this is time series data that has some structure but is not necessarily timestamped cleaned. Verified things like meter data, SCADA data, pi data, things that are necessary and useful for all the ancillary operations, but difficult to get into a highly structured format or the data source is enormous. So it's time consuming and complicated to integrate into your highly structured data warehouse. So that's probably the next most useful category. Unstructured data is things like text, images, video, things that would benefit from natural language processing, machine learning, especially in some of these emerging markets like we've been.

PAUL KAISHARIS

Talking, mean Ryan hit on it, but maybe a little bit at not too technical level, but a little lower of a level. I mean, with the structured data, it's traditional relational database data, stuff that's stored in tables, trade data, market data curves, trade valuations. That's considered the structured data with rows and columns of information. On the semi structured side, to elaborate down that one a little bit, it's kind of JSON and XML type data where you get a little bit more descriptive information about what is that data about. For example, for Molecule, on the semi structured side, obviously Molecule has a lot of structured data with what I just mentioned in terms of trades, market data, et cetera. But on the semi structured, we provide value at risk calculations through a JSON structure, which is again a more complex structure that you can get more descriptive, more information on. Also a lot of modern ETRM systems, of course, like Molecule has APIs that return data into that JSON structure. So what do you do with semi structured data like that? And then Ryan mentioned the Unstructured, which is the documents, video and images and all mean that's a typical categorization of those data structures.

BEN HILLARY

Excellent. And I guess following on from that, what are your approaches and your best practices in managing unstructured semi structured and structured data? Paul, if you want to continue?

PAUL KAISHARIS

Sure. So I would always say start with security and start with security first. And you always got to consider security and security of the data. Always consider the principle of least privilege. It basically means only provide access to what a person or entity needs to do their job. So start with security first. Alex mentioned about all these different sources and different locations of the data. Bring all the data together into a centrally managed store location. Bring that information together. And I mentioned about technology maturity, take advantage of relatively low cost cloud storage infrastructure. There's ways to store and bring this data together, and bringing that data together and taking advantage of that lower cost cloud storage I think is some things to consider there.

BEN HILLARY

Excellent. Tim, your thoughts on that?

TIMOTHY KRAMER

Those guys pretty much hit all the points. I have nothing of substance to add, but thank you.

BEN HILLARY

All good. Ryan, do you have anything to add in terms of how the best practices of managing those data types?

RYAN ROGERS

I think Paul hit the big ones. The only thing I would add is maybe know data governance policies, ownership, you know, policies on quality checks, data lifecycle management, just kind of ownership, health and hygiene when it gets purged, where it came from, documentation. But Paul had a big one, security access.

BEN HILLARY

Excellent. Okay then. Well, on this subject, let's move on to our second poll. But before we do that, I would just like to remind the audience that we will be taking your questions towards the end of this discussion. So do keep on putting them into the Q A and upvoting any within there that are of interest. So moving on to our next poll. Okay, I am launching the next poll, so everyone should see that. Now, are you currently using or considering implementing a data lake single choice? Yes, I'm using, I'm considering implementing no, I've got no plans to or not applicable. So audience, keep on throwing your answers in there and I'll close the poll in 5 seconds. 54321 ending the poll sharing results. So I'm not sure if that's as expected or a surprise for the panel. I guess it's kind of what I would have expected.

RYAN ROGERS

I'm actually surprised 24% actively using a little higher than I expected.

BEN HILLARY

Good to see, good audience. Okay, excellent. So, next question. Data interoperability. It's a term that is talked about a lot in the context of a data management strategy. But what does that mean and how does a data lake enable better data interoperability? Paul, if we could go to you with that one, please.

PAUL KAISHARIS

Sure. Probably best to first start with kind of the definition of data interoperability. And I just happen to have the definition handy. So I'll just read so data interoperability. The pure definition of it is the ability of systems and services that create, exchange and consume data to have clear shared expectations for the contents, context, and meaning of that data. So it's for shared systems to create some type of meaning of that information. So they come from different sources, different types of data attributes. But how do you provide a common meaning about that? So interoperability provides, like I said, meaning and context to the data. It does allow disparate data to be organized and cataloged. So having an implementation around data interoperability and it also relies on good metadata management. And to that, second part of the question is how does a data lake support that? It really is about kind of that metadata management, the cataloging of the data and really automating of doing that and providing that type of meaning. We'll talk more about this later. But in terms of what these kind of technologies allow to bring all these different structures of data together, how do you provide that meaning to it? And really, that metadata layer that provides information about that data that you can use to catalog, describe, organize that information, that's really the idea behind the data interoperability and what data lakes do to support that.

BEN HILLARY

Right.

RYAN ROGERS

I'll pick up on the second half of that question how data lakes enable this. And I think there's four kind of key elements that a data lake will enable that's flexibility. So being able to pull in disparate data sources without predefining the schemas and the speed you're able to do that scalability. So a lot of these new cloud services do allow much cheaper and larger storage and then making the disparate data sources centrally located. So there's one place to go. And then making that central location queryable. Some of these advanced tools that are emerging are enabling users to query some of that semi structured and unstructured data together.

BEN HILLARY

Thank you. Next question. I'd like to take this one to Tim. What role does the ETRM play in the data management strategy? What are the benefits of using a data lake in partnership with your ETRM? Because I believe you've got a real life experience here.

TIMOTHY KRAMER

Sure. So some background information to kind of give you the context for the answer here, and that is electricity is the most consumed commodity in the US. On a retail notional basis, but it wasn't in any index. It wasn't in any ETF, any mutual fund, nothing. So what we did is we created the first ever carbon neutral electricity index. And then we partnered with Ice Intercontinental Exchange. We published the index in January, and then in mid May, we launched an ETF on the New York Stock Exchange. The ticker is amped AMPD. So given that we had to develop the index from scratch because nothing existed. So what we did is we actually used the risk system for this. And we were kind of adamant about we wanted to develop the product inside of the risk system, and then we wanted to use that to basically run and manage the product. And we also wanted to use it to market the product so that when people had any questions, they could say, where'd you get that? Here it is. Right. So when it came time for the development of this, we used the risk system because we had to figure out what was the optimal setup for the index. So you're trying to figure out what the best risk adjusted returns are. So you're looking at all sorts of variables like roll windows, future tenor, future selection, weights, collateral, et cetera. And so you have all those different pieces. And that's a heck of a lot to try to dig, jest, and figure out what the right thing is. And so, I mean, if you do that in Excel or some other database, you're going to mash F Nine and get a white screen for about five minutes, and maybe you get an answer and maybe you don't. But if you do that in a risk system which allows you to customize those inputs, you get an answer right away, and it just makes the optimization a lot easier. So the benefits of that are in the development part of this, it just saves you a lot of time. It's more reliable, and it just looks more professional when you present that to people. So when we walked up to Ice to partner with them and we showed them the risk system and how we developed inside the risk system, they were just, okay, this is great. Let's go. In terms of the ongoing management, then, since the product is up and running, the obvious things are performance and P L, but then you get a lot of questions, know, risk metrics, and then you get questions about PCA Principal or Portfolio Component analysis. So, okay, what percentage of returns came from this kind of electricity futures, what percentage came from carbon analysis? And so to be able to tease that out and have that in the risk system and not have to keep downloading data and beating up Excel and trying to create all these bespoke reports that somebody may or may not actually pay attention to. And then when it comes time to the marketing, it doesn't matter who you walk in and talk to when you market it, whatever you have, they want to see something different. So if you say look here's the sharp ratio, they go oh well, we use information ratio here, we use sortino ratio here. So if you have a risk system that has those things in it, that you can give those things real time and live, and they also vary the time frame they want to see. So, a risk system that can give you all those portfolio metrics, basically real time as you're walking in and talking to a prospective client, that helps. And then you're always going to get these bespoke requests. Like they want to see what's the correlation of this product, that product, what's the different time frame for the correlations? And then they sometimes want to have the data exported so they can run it in their own systems. And so, having that function inside of a centralized risk system is invaluable. So that's kind of how we tackled this. And that's why having everything from soup to nuts on a risk system was valuable to us, because there's no handoff problems, there's no data leakage and you can respond real time and it just makes everything look more credible and more professional.

BEN HILLARY

That's excellent. Well, next, let's go back to Ryan and Paul. What are the benefits of a data lake and how do you prevent it from becoming a term? I love data swamp.

RYAN ROGERS

The benefits are some of those we've mentioned already, the scalability, being able to incorporate enormous amounts of disparate data, flexibility dealing with queries of structured, semi structured unstructured data. So the lowered costs that a lot of these new cloud tools offer speed in incorporating new unstructured data sets or semi structured data sets rather than going through the project lifecycle of ETLS transformation and loading and then the advanced analytics capabilities. And this is all fairly new to the ETRM CTRM market, but all of the machine learning and some of the older tools like natural language processing enable more advanced analytics. And then on the preventing a data swamp, some of the practices we've talked about, governance policies, about data cataloging, metadata management, access lifecycle management for not just the retention, but when it'll be deleted and kind of lowering that attack surface, purging old data when it's not needed anymore, monitoring it. So somebody still needs to own the quality of the data, even though it is not going through that ETL process. And then documentation of what it is, where it came from, how it should be used.

PAUL KAISHARIS

Just one thing I would add on the benefit side, I think one of the key benefits of some of the more modern data lake technologies is that you can just bring the raw data in. Traditionally, historically, there's been a lot of complication cost delay in having to do complex extract, transform and load operations in a batch mode. So being able to bring that data in its raw format is one benefit. And Ryan touched on all the other ones. I did want to add one thing about kind of the ETRM responsibility, I think, related to data lakes and providing necessary data. I think one of the responsibilities of an ETM system, one like molecule, is to be able to provide that data in real time, not in a delayed batch slow mechanism. I don't think an ETM system necessarily needs to be the ones to provide the data lake technology because there's companies out there that do better than I think ETRM companies do. But it is a responsibility to make sure they get that data fast, to be able to feed into the data lake and feed into the analytics that we talked about on how to prevent the swamp, which is another definition of it's, an organized pool of data that's difficult to use. It's really directly related to data interoperability we talked about earlier. I mean, the metadata later, being able to effectively use that metadata data and provide that meaning to that data, being able to property catalog and keep track of the data and who has access to it. A lot of these newer technologies provide auto detecting of schemas, like schema being descriptive information about that data. So using that information and auto detecting the schema of that data to provide that cataloging of that information. And data quality we mentioned this already is an important aspect of this because even though the machines at Technology can do these things, there still needs to be some level of a business user or subject matter expert involved in the workflow to make sure that that data that's going in is good. Having bad data that feeds the swamp. But you got to have some kind of workflow governance to make sure subject matter experts can filter that data a bit.

BEN HILLARY

Thank you. Next question, I'm sure is one which many listeners on the Webinar can sympathize with and probably many of our panelists. Also, what are best practices for actually ensuring the right data is accessible in a secure manner to the right roles within the trading organization? So Ryan, if we could start with you on that. And then we'll go to Alex, paul and Tim.

RYAN ROGERS

Certainly. So on the security side, encryption, obviously, but then also multi factor authentication for the lake. I think beyond that, it's controlling the access, role based access, just like people are used to with their ETRM or CTRM systems, but role based access for the data lake. And then audits. So having audit trails of who access data and what, when, and then regular audits of who accessed the data and what, and when.

PAUL KAISHARIS

Sorry I wasn't next.

TIMOTHY KRAMER

I'm sorry. Go ahead.

BEN HILLARY

Over to you.

PAUL KAISHARIS

Yeah, sorry about that.

BEN HILLARY

That's all right.

ALEX WHITTAKER

Yeah, I think for me it'd be about communication, about actually identifying what is the right data, how is it used, how important is it? I think one of the things I've learned at sort of like a young company, a small company, is that I reckon it's probably best practice to actually have some sort of data specialist or a specific role with a data expert quite early on in this. That's definitely something I'm considering at the moment, who would then really again, you're sort of looking to this thing about centralizing all of that data in one place and say that information and communication coming in one place. So one person has the full picture and can understand those fiddly issues like consistency and things like that. And where are you having to make sure prices are all done at half a seven or if there's a dog leg between half four and half seven, things like that. So I think having a specific person in charge of that, a specific data expert, I think would be best practice. But the usual things with technology of communication, taking your time to get the details lined up and actually understanding what you're doing, why and how. Because there's a huge return on investment in the time you spend doing that early on.

PAUL KAISHARIS

Paul all right, so we've hit on a lot of these points, but I'm going to focus hone in specifically around security. I mentioned least privilege. Start with don't give access out that you don't need to. Encryption is incredibly important. Also encryption at Rest, what's on storage, make sure that whatever's on storage is encrypted and in motion over the wire, anything that travels across the network. Need to make sure that data is encrypted. A lot of cloud providers talk about shared responsibility models, which basically is they do their part, but you also need to do your part. So even though cloud providers have a tremendous level of security they've put in place and the SoC compliance, et cetera. But as a company, as a product company, whatever you are, you need to make sure you do your part on that side of it. And then we talked about strong authentication and authorization authentication. With two factor authentication, you need to make sure that you're authenticating the people so you know who they are. Authorization. Ryan mentioned role based access, controlling what they can see based on roles. And one we haven't mentioned really around geofencing, where you can actually control who has access to that data based on your location. So if you're vacationing in Costa Rica, maybe you shouldn't get access to the data if you're doing that. But so there's things like that you consider and then you make sure the security controls cover everything structured, unstructured and semi structured. And we talked about the metadata. You can use the metadata. In these systems to help define those security, to implement those security controls. So we've talked a lot mean the tools and technologies are there. You still have to be very smart about how you use them.

BEN HILLARY

And Tim, your thoughts?

TIMOTHY KRAMER

So not so much anything I can add around like the data access part, but it's more like what people do with that access. And so we would have issues with people, they still tend to want to run their own stuff inside of Excel and then you'll have a scenario where you walk in the morning and someone's, oh no, we have a compliance issue. Last night we had a VAR or position blowout and we got to report this to the CFTC in September of 25. And you're like, okay, so you spend 2 hours tracking it down and it was part of a spread trade and there was no issues. And so there's 2 hours of my life I'm not getting back and there really isn't a problem. So what we try to do is we don't discourage people from trying to do their own work and figure things out, but we want people to make sure they use the risk system and use it the right way. And if it's lacking something or there's an improvement, then make sure you get that in the risk system rather than having people just taking the data and have the half baked data and try to run their own reports and then things may or may not come out of that that are useful.

BEN HILLARY

Thank you. Next question. There was no way we were going to be able to get through a webinar without having a question on this subject. So the question to everyone is, do you see AI or machine learning playing a greater role in data management within energy and commodity trading? Tim, if we could go back to you on that question please.

TIMOTHY KRAMER

Yeah, sure. So we talked about how we developed our index and then our ETF amped. And I think in the past I may get this wrong, but in the past I believe three months you've seen 27 different new funds come out that are using AI. And so in the commodity space, what they're doing right now is they're saying, okay, I'm going to look at all the different commodities and when I need to roll the commodities, I'm going to use AI to select which ones are the best. And so in order to implement that and get that into your risk system, that's an entirely new thing that people are trying to get up to speed on. And it's one of those things where when you get a back test it looks great, but how does that work going forward? The fact that you're actually interacting with the marketplace, did you change what would have happened in the past? So that's kind of the things people are looking at is not just the AI and then the optimization of something in a new fund, but can you go the next level with that and can you say, okay, the AI did or did not change what the back test looks like?

BEN HILLARY

Excellent. Alex, your thoughts?

ALEX WHITTAKER

Yes, in terms of data management and energy trading, I think there'll definitely be a role for AI and machine learning. I think also people need to be very careful about focusing on the problem they're actually trying to solve. I think often with technology people can get carried away with solution based design, they forget why they're doing something. So just focusing on that balance between problem based design and solution based design and just having a clear goal in mind and sticking to it, I think is important because ultimately you want to make sure that you are helping yourself, getting the productivity gains that you should be getting from something like AI, rather than, I don't know, getting caught up in a sale and just sort of getting a bit carried away. I think over maybe the last ten years or so, a lot of people have used technology and not got the benefits from it that they should do. So in this next wave of AI and machine learning, I think if people focus on delivery and results, then perhaps they might learn from some of the mistakes they made in the sort of first wave of technology coming through energy.

BEN HILLARY

Paul your?

PAUL KAISHARIS

I mean, I would say for know how that's all going to play out and be done is I think there's still work to be done to figure all that out. But I think in particular around the private use of this technology, I don't think any company wants to put their data out there and then have OpenAI Chat GPT learn on their data. But what you're seeing now, you're seeing OpenAI came out with an announcement just a few days ago about Chat GPT for business. So where now companies can create these large learning models on their private data? McKinsey has a technology that they've been promoting internally to help their consultants. So you already have these big enterprises that are starting to use this type of machine learning, AI, large learning models on their private data. So I think to me, yes, I see that, particularly around private data. And then of course, all the things we're talking about with commodity trading systems, they need to be in the play on all this and feed all this to be able to help businesses that play in this space. I think that's a key role of systems like this.

BEN HILLARY

Ron.

RYAN ROGERS

So part of the reason I was surprised by the poll that almost a quarter of people are using data lakes is I have seen them used, but mostly at the very largest of my clients, the vertically integrated, and it hasn't really trickled down into the medium and smaller shops yet. But those companies that are where I've interacted with them, they are hiring incredibly bright data scientists. They're starting to hire up data engineers. They have an army of very smart people focused on this. And then a little bit it suffers from the problem Alex mentioned of solution based design rather than problem based. But I think there is enormous potential, especially I mostly work in the financial too, but physical commodity trading and there are enormous amounts of semi structured data that are critical to the operation. So this isn't emerging needs, these are like schedulers. So if you have crude product schedulers NGL schedulers in your shop, every single one of them has their own unique tool, usually Excel, where they're doing all their supply demand balancing. And it's impossible to standardize all of that into one enterprise supply demand forecast. And so all of that incredibly valuable data that your schedulers are accurately managing is not in an enterprise system. And that's something that machine learning or data lake could begin tackling intelligently that is hard to design a structured system and a structured scheduling tour ETRM capability for, don't get me wrong, all the ETRM systems will capture after the fact like an accounting approach. What did you nominate then? You go put it in the system. What did you move then? You could put it in the system. What was actually you could put it in the system. But on their spreadsheets they have the day to day forecasts of their supply demand balance and that is something that would be very valuable to tackle and machine learning is possibly a candidate for that.

BEN HILLARY

Thank you. Well, there's one more question from me and then we will move into questions from the audience. I see we've got a few questions from the audience already. So audience do have a look into Q and A, box upvote any which are of interest and add your own. So, final question on me and I'll address this to everyone. We've seen an evolution over the years from data warehouses to data lakes. What's next? What's the next step in this evolution? This is the crystal ball question. Paul, if we start with you please.

PAUL KAISHARIS

Sure. I mean kind of talking about the evolution real quick but not too long, but basically we saw from departmental use cases of data warehouses, so structured data and good use for the financial department, risk department, et cetera. And then we saw the evolution to big data, but the tooling was very complex and expensive to run and then cloud providers came up, but the tooling was still very expensive. But now we're seeing lower cost in cloud providers, less complex tooling to provide access through common things that people use like Python or SQL or Excel. So with that evolution, we've kind of hit this already around the tools that enable, I mean, the evolution is really around what's going to enable modern data science and development of private large learning models and data analytics. I mean, that's to me, what's really the driver that's next is, okay, the tools and technologies data there, put the right governance in place, have good data quality, but now you need to be able to use that information.

ALEX WHITTAKER

Just to.

PAUL KAISHARIS

Do the data analytics that you really need to on a large data set versus a very small departmental level type data.

BEN HILLARY

Ryan, your thoughts?

RYAN ROGERS

Yeah, I think there's probably some big ones and then smaller ones. I think the big ones are maybe incorporating the data warehouses into the data lake combined. You're always going to have a need for highly structured data, for risk management compliance, but placing that within the data lake so that the same users can also access SCADA data, pi data, all of the other useful, less structured or scrutinized data. And then I think some of the opportunities with machine learning and AI, beyond things like pulling in physical scheduling data and making use of previously unaccessible data are things like machine learning can probably start tagging the data, inspecting the quality of the data. You can probably automate some of that. That was very laborious for humans.

TIMOTHY KRAMER

Yeah, I think Ryan nailed the two things. The first would be kind of I'll just call it one stop shopping. And so people right now want instantaneous answers, so they know the data warehouse, the data lake, the risk system, they want all that stuff instant access, and they don't really care where it comes from as long as you're able to document it. So as long as you can grant that instant access, that's what they want. And then the second thing is the integration of the AI with that that's moving so fast and there's so many demands right now for okay, I see the vanilla product. I want an AI product on top of that, what's that look like? And they want that now.

BEN HILLARY

Excellent. Alex?

ALEX WHITTAKER

Yeah, I think the next step in this evolution ought to be a focus on essence in that get the current technology working. I mean, we got to data lakes, so let's start focusing on delivery and results, getting the most out of what we have at the moment, learning that in detail, and then you'll be in a position to start adding to that, to when the next technological advancements come along. But if you don't stop and actually get what's here now working for you right now, today, then you're just going to end up in a mess and a sort of never ending hamster wheel, basically.

BEN HILLARY

Thank you. Well, we've now got about ten minutes to take some questions from the audience, so I see four rather interesting ones already. I'll address these to the panel. And please, panel, do jump in with your thoughts. Firstly, from Ali Selik. Hi, Ali, hope you're doing well. Paul mentioned to consolidate your data in one place, but what is your experience with data lakes versus regulative requirements on data, and where are they stored? How to manage the data, let's say in three regions, us, UK, EU versus regulative requirements in these regions. Great question. Who wants to take a stab at that?

RYAN ROGERS

Without going into too much detail, I do know some of the data lake providers, cloud providers, do offer geopartitioning, so I haven't practiced that myself. But I know that you can geopartition portions of the data.

PAUL KAISHARIS

Yeah, I'll add to that. We had to deal with this. We'll first answer the question. You can't ignore those regulatory requirements, right? They're there and you have to account for them. At Molecule, for example, we're running in the US and Europe now, so we actually had to set up instances of our system in the US and Europe and data can't be shared between those because of those regulatory requirements. I mean, some of the things are available for the data, they can bring the data together. There's still multi tenant kind of capabilities you can lay on top of it, but certainly most likely because of the regulatory requirements, you can't store data from Europe in a US data center. You're not going to be able to do that. So you're going to have to make sure you bifurcate separate that data where part of it's running in the US and part of it's running in Europe because of those regulatory requirements. Now, I guess maybe the question is you're losing some of that insight of that consolidated data, but I don't think you can ignore those regulatory requirements, though.

BEN HILLARY

Thank you for that. Next question, joe Hollington, good to see you as well, Joe. How easy is SSO to embed into the data cells I. E. To ensure only licensed users get the data?

PAUL KAISHARIS

It's kind of a technology course I can take. John, you're talking about single sign on type technologies and you reference like data of most modern systems provide single sign on type capabilities where you have a identity provider like Okta or even AWS, that is what can authenticate you and who are you. And then the systems themselves provide the authorization layer. What can you actually do with that information? And so most of these systems we're talking about, Molecule for one, can also supports SSO type functionality and then the underlying technologies implement the security controls on top of that to control access to the data. I think that's what Jan you're referring to, but let me know if that wasn't correct.

BEN HILLARY

Next question from Tiffany Main. How would you break down the data management vendor landscape? Are there good end to end solutions or is it better to develop your own best of breed?

PAUL KAISHARIS

I can, I can come, I can I'll start and throw my two cent on that one too. I don't think anyone's won it yet, but if you look at kind of the major players that are sitting out there, we hear a lot about Snowflake, which has been around for a while, that provides a data lake and provides these things that we're talking about. I'm not promoting snowflake or any of these technologies, but that's what's out there. Databricks is another one that's getting a lot of attention out there in terms of what that landscape is as well. Obviously you got to consider the cloud providers like AWS that are starting to provide these kind of solutions. Personally, I don't think I would embark on building your own best of breed kind of thing. I think that's getting into some pretty complex territory. The costs and the resources required to do some of these things are really high. And for you, unless you're yes, a big large enterprise that feels like they have the resource to do it. But I would recommend more taking advantage of the tooling that's available out there. Give you example, I think one necessary tooling that Molecule does, we use this At Molecule. Again, we're not a data lake provider, of course, but we use data streaming technology from a provider called Kafka that we use and we use that technology to stream in real time data to whatever destination it needs to go to, to feed these technologies and solutions we're talking about. Again, I wouldn't embark kind of building my own, but I would definitely look at what the landscape is out there. But I don't think anybody's won yet.

ALEX WHITTAKER

Yeah, I think what Paul's saying, it's an interesting question, this as well. I've spent quite a lot of time looking at bomroy's technology infrastructure, how we would increase that, invest in that as the company grows and things. And yeah, I don't really know any data management vendors at all. Like a CTRM would barely work without data and things. And yet you'd think actually looking at this and these discussions that they ought to go hand in hand and yet they're very separate. And as Paul says, it's difficult to identify any data management vendor who's saying it's particularly strong or know won in ETRM, CTRM energy and commodities. So it's something I'm more interested to find out more about and to actually talk to some of these vendors because again, I think it's a path that I'll be going down quite soon.

RYAN ROGERS

I would agree with Paul on not going down the best agreed path just in general, but I also would avoid the other extreme of the all in one vendors without naming names. Some of the legacy 90s behemoths offer all in one solutions. But it's hard to imagine you're hiring smart data scientists, smart data engineers, and then lock yourself into some antiquated infrastructure for all of these emerging capabilities. So I'd probably split the baby like Paul was mentioning, go with the specialized vendors, things like snowflake, and then really it starts with the people you hire. If you don't hire the right data scientists, who hire the right data engineers and make the right decisions, it doesn't really matter who your vendors are.

BEN HILLARY

Excellent. Just had an interesting question come in from Stephen Nemo. Are the ETRM vendors going to begin embracing existing data schema standards such as FpML or ISTA CDM. Anyone want to give that one a.

PAUL KAISHARIS

I I'm not as familiar with these particular standards, but I guess what I would say is most of these around kind of integration type schema standards. I would say from an ETRM. Provider is that to support these standards, what we would do is we probably wouldn't change initially our core systems to do these, but what we do and what we have in place is really extensions to I mentioned these. Data streaming technologies where we can take any type of data in our ETM system market, data valuations data, whatever it is, and we can transform that to any type of other format. And we can take one data source, have multiple destinations, have multiple schemas we can support in how that data is delivered. Do Etram systems themselves support those standards in the core? No. But I would say for I can only speak for Molecule, for us, it would be like what we have done is we have the ability to support those standards and multiple standards without really a big lift on our side to get the data of our system. We just need to change the transformation end piece to support those schema configurations. But again, I'm not as familiar with what those standards are, though.

BEN HILLARY

Very good. Well, we've got one additional question from Tiffany Maine. To an extent, I think it's been covered in Ali's question, but perhaps there's an additional angle on sort of data sovereignty, data ownership. Have the panel faced challenges when it comes to multiregional data interoperability? How about data sovereignty issues? That one probably been quite well covered in the previous question.

RYAN ROGERS

Yeah, I think geo partitioning probably solves.

BEN HILLARY

Agree, agree. OK, well, that actually brings us very neatly to nearly the end of time. So at this point, I would like to hand back to you, Carrie.

KARI FOSTER

Great. Thank you so much, Ben. And huge thanks to all of our panelists today and the expertise and advice that you brought into this discussion. Very interesting discussion. And you may have seen, as you were registering for this webinar mention of something called Big Bang. And in fact, this is Molecule's forthcoming data lake as a service platform, which is an add on and works in tandem with Molecule. And the launch of that product is imminent, which we're very excited about. And we're actually planning a webinar for Big Bang in October. So please be on the lookout for that from Molecule in the coming weeks. So I'll end my promotional bit there. But thank you all for coming, really appreciate it. And I will pass things back over to Ben to close out the Webinar.

BEN HILLARY

Lovely. Thank you, Carrie. So, yeah, just huge thanks to our panel for their insights today and you, the audience, for joining. So the webinar recording will be sent via email to you all in the next two days. If you found it of interest, do please share with your colleagues with your wider net work. If Carrie, myself or any of the panel can be of any assistance, drop us a line or connect via LinkedIn. And so, yeah, from my side. Again, many thanks, audience, panel. You've been fantastic and wishing you all an excellent day or evening ahead. Thank you.

RYAN ROGERS

Thank you.

TIMOTHY KRAMER

Ben.

RYAN ROGERS

Carrie.

KARI FOSTER

Thank you.

ALEX WHITTAKER

Thanks, everyone.

Written by: Commodities People

Rate it

Previous post

  • 308
close

labelAll Posts todayAugust 3, 2023

Data Interoperability For the Win: Unlocking Advanced Energy Trading Insights Within Your ETRM

PLEASE COMPLETE THIS FORM TO ACCESS PREMIUM content: GIVE ME ACCESS PLEASE COMPLETE THIS FORM TO ACCESS PREMIUM content: GIVE ME ACCESS The energy trading risk management landscape has transformed [...]

Read more

Post comments (0)

Leave a reply

Your email address will not be published. Required fields are marked *


Get Energy Trading Premium Content in Your Inbox!
Enter your details and topics of interest and we’ll send you fresh energy news and insights weekly!
Get Weekly Energy Trading Premium Content in Your Inbox!
Enter your details and topics of interest and we’ll send you fresh energy news and insights weekly!
You are Subscribed!
Thank you for your subscription. Energy News will be coming your way soon!
You are Subscribed!
Thank you for your subscription. Energy News will be coming your way soon!
Get Energy Trading Premium Content in Your Inbox!
Enter your details and topics of interest and we’ll send you fresh energy news and insights weekly!
Get Weekly Energy Trading Premium Content in Your Inbox!
Enter your details and topics of interest and we’ll send you fresh energy news and insights weekly!
You are Subscribed!
Thank you for your subscription. Energy News will be coming your way soon!
You are Subscribed!
Thank you for your subscription. Energy News will be coming your way soon!
This website uses cookies to ensure that you get the best experience.
This website uses cookies to ensure that you get the best experience.