The Crocodylus porosus, also known as Saltie, Estuarine or Indo-Pacific Crocodile, is the largest of all living reptiles, as well as the largest land and coastal predator in the world.
aging
Data lakes are huge, by definition. They work to house the Morass of non-structured and semi-structured data that are generally non-filtered, often double, typically incompatible and low-level (ie logs, status systems, clickstream data) and more and more machinery produced by the internet sensors Lake.
In balance, data lakes are considered a good thing. They allow organizations to make sure they record all the data they can channel through any operating pipe of their stack of IT. Having access to data warehouses that have not contributed, when needed, it is a comfortable place for the Chief Data scientist in any business. It is considered as a basic move for businesses to capture their strategy for data (who knows how the company can use the sensor data x, y and z Tomorrow or next year?), The data lake also represents a democratization of the data, that is, a very deep pool and – Dip at any time.
Data lakes also store structured data, such as information flows from customer relationship management systems or business planning systems, but are less often discussed in this role.
In our current AI-Everything climate, organizations require the visibility of their businesses and the activities carried out by their customers. Data lakes help make this possible and also ensure that a business can concentrate around a repository so that data silos does not start to grow … and that is good.
Danger: deep water
As with all aspects of technology, there is a yin and yang factor that you should consider. If we think back in pre-millennial (or at least pre-cloud) times, when an organization had 42 databases (and many ran more), users had to know 42 database features and a corresponding number of measures and security procedures for data access. However, in a single lake of data, it is theoretically possible for a person with access to the right credentials to access all through an entry point. The legendary “Single Glass Window” strategy that chases so many companies when it comes to data, applications and business actions becomes the same single window that an intruder has to break to enter.
This reality has been marked by Steve karamHead of AI and SAAS product in Devops Platform Company (also known for its heritage in the business check and application test and life cycle management). Speaking to a round data analysis table this week, the man’s development man underlined more risk in water.
“It is always important to remember that there is Sam – and most organizations have a sam, they have been with the company for decades and, during their term of office, they have built a database in which no one else is aware.” But what if the Sam data store includes double personally recognizable information and the columns with this PII? This would be an ideal supply soil for crocodiles that live beneath the surface of the lake.
Karam invites us to add AI to the mixture. Compared to analysts who are special wranglers and writing targeted questions to get what they need, says AI has a “omnivorous, insatiable appetite” these days (he really used the term datavore, well, someone had to win it at some point) and that means it wants to eat it all. He sees it as something “Blabbermouth” that leaks more secrets than a relative of the chat family during a holiday dinner after too much wine. The risk landscape explodes below.
Dipping our toes
“So we have a quandary: teams in all businesses depend on rapid access to data manufacturing and testing, to get to the market faster and optimize strategy … However, data lakes are essentially useful things,” Karam said. “For an explanatory example, consider the fact that detailed data is increasingly necessary to meet demand for customer experience. However, the risks are very real, our market study suggests that about half of organizations have reported that they had already experienced data.
So what is the answer? Cudding and dividing data into different categories is a good starting point, Karam says Microsoft’s Medallion architecture is a good example.
Microsoft is really talking about this technology as an architecture of Lakehouse Data Medallion (a medium of the lakes of data and structured data warehouses with the extension of the lake, but about data management and trading capabilities) and is essentially a data design model used to organize data.
“Medallion’s architecture describes a series of data layers indicating the quality of data stored in Lakehouse. Azure Databricks recommends taking a multilevel approach to building a single source of truth about business data products. Microsoft at the Web Microsoft Portal.
What happens below is synthetic, but at the same time it is very tangible and real.
Data mask and synthetic data
“The next step is to find ways in which we will give non -production groups (with which I am talking about our friends in the development of software applications) realistic data without risk, so this means that access to techniques such as data coverage and the use of synthetic data. Large volume requirements such as unit tests.”
Static data coverage replaces sensitive data such as personally recognizable information (remember Sam and PII concerns?) With synthetic but realistic values that are deterministic and persistent, so as to maintain the integrity and demographic elements of the reference. This means (theoretically and even in practice) that software developers have really useful data without the risk of exposing randomly sensitive customer data.
As an example, growth groups in a bank could see the rest of a customer to look for abnormalities, spikes or other exaggerations, but they would have no idea the customer could belong to. The date of birth, social security and bank account number and other personal identifiers will be covered. Many organizations are likely to have a place for both techniques, supported by extremely automated tools to alleviate any additional workload for developers.
Cleaning and risk compliance
“The new cases of use in AI can also help. In addition to synthetic data, AI is used for automated physical language processing tests, relieving test groups from the weight of writing test scripts and maintaining data relationships with production,” Karam said. “Even if an organization is already ‘all in’ in data lakes, it should continue to deal with software development and quality assurance data as separate data environments that are dangerous, solid, clean, complying and delivering quickly so that groups can build without concern.
The main providers in Data Lake Arena include Amazon (the simple storage service AWS S3 is the technology that supports a large number of data lakes). Microsoft Azure Data Lake and the Data Lake Analysis Service of the company. Google with her biglake (loved by those who want to build an Apache Iceberg Lakehouse). Ai Data Cloud Company Snowflake and Databricks with her relationship with Microsoft.
Although Perforce did not have its own agenda or the message in this discussion, the company competes with git control, Atlassian Bitbucket Data Center, Apache Subversion and Mercurial to name a handful of. In the software test, Perforce shares its market with browsserserstack, Labs, Lambdatest and (when is the company not somewhere on most lists?
Taking these steps and approaches submitted here could help detect, ring and mitigate the dangers around the lake information and balance its role against the need for protection. Crocodiles may still rotate, but there are safe ways to enter the water if we know what kind of protective clothes to wear. These procedures may not kill the crocodiles of the lake (malicious attackers and ne’er-do-wells), but it may mean that some of them are forced back to the shore.


