Home

Data Processing Quotes

There are 462 quotes

"Neural nets are these mathematical expressions that take input as the data and the parameters of the neural net for the forward pass, followed by a loss function that tries to measure the accuracy of the predictions."
"Data analysis is taking raw data and creating useful information."
"It's technically a biased compression algorithm... it takes the complexity of the world and compresses it to a few simple axioms."
"The idea of deep learning is that you don't need to manually engineer the features; instead, you learn these features just from raw data."
"The introduction of attention was developed...and it makes it very efficient and capable of capturing long-term dependencies."
"Processing very large amounts of data, machine learning is all the rage."
"Our brains are the fastest data processors on earth, far more efficient than any computer you have ever used."
"Kafka streams API allows reprocessing... it is one that is really important."
"A codec is a portmanteau of coder decoder because it's responsible for encoding and decoding a digital data stream."
"This also enables you to process data far more efficiently and effectively."
"Every single chakra is bringing in information or data, interpreting it, sharing it with your brain."
"Overall, the process gives you a smoothed out version of the original data."
"Hadoop MapReduce: a whole new paradigm in processing data in a parallel way."
"The open telemetry collector is a way for developers to receive, process and export telemetry data to multiple backends."
"Computers are machines that give data outputs based on data inputs... they are serenely undaunted by complexity."
"So this is a wonderful way to process a sequence of calculations for a range of cells."
"You can try to put raw data in one end and have this very complicated high dimensional problem. The ideal of end-to-end is one simple output at the end."
"What they do is they basically slurp in a huge data set from somewhere in the world."
"Fold L... it's going to apply an operation on each item of a list... it went through all these different list items from left... and multiplied them all together."
"Kafka streams are mainly used for data processing and transformation; you can enrich data, transform data, perform filtering, grouping, aggregation, and a bunch more."
"The data actually has the ability to process that."
"We're simply converting raw data into useful information."
"Now you have an idea of what happens to the hydrant anchor signal and how does it end up being fed into 480 C's at 64-bit or sample per second."
"So what we are going to be doing is building an AI tool that can take a file with thousands and thousands of articles, sift through it, and basically tell you what each article is about."
"Columns function: counting columns dynamically."
"Extracting fields vertically instead of horizontally."
"Now that we have a stream of millions of 6-bit symbols yielding hundreds of megabits of data per second, in order to turn it into your favorite TV show we use the advanced video codec, or h.264 format."
"The map function allows performing a calculation on every item in a list and returning a new list."
"Use the pipeline operator to nest function calls, like filtering even numbers and then multiplying them."
"The robot needs intelligence, it needs to sort data from sensors, encoders, Motors, Network traffic code, video streams, microphone inputs, physical conditions, and respond to them in real time."
"That shows you now that with the power of walk you can write programs that can parse all kinds of lists and logs."
"Imagine you're mining for gold... That's what AmazeOwl does, the heavy lifting for you."
"With 500,000 consensus events per second, we can kick off a level of automation that would be impossible inside of a data center."
"Every time we get a new chunk of data from the buffer, we can start using it."
"BigQuery solves many of the issues with your traditional data warehouse system."
"Functional languages naturally lead you to process data structures while avoiding side-effect I/O."
"We're designing a dojo supercomputer... billions of miles of data."
"One hot encoding is a trick for taking data that is categorical and splitting it up into a format that XGBoost and a lot of other algorithms can use."
"Now, YouTube videos, Netflix, Amazon, BBC iPlayer, all of them crunching through vast amounts of information on their websites."
"Edge computing devices are IoT or sensor devices that have intelligence themselves, so they can not only collect data, they also pre-process the data for you."
"Our mind is doing that constantly it's taking in data data data."
"The big advances that we've seen in AI has come about when people have done exactly that: just throw 10 times more data and 10 times more compute power at it."
"Python: Widely used for complex data processing algorithms."
"Python language: powerful for complex data processing and signal processing tasks."
"CPUs like working with a fixed amount of data."
"Let Power Query do all of that for you and automate that entire process, and best of all, you don't even have to know any coding to do this."
"Handling all that information explicitly is often impossible, so we make a fantastically liberating assumption: linearity."
"Functional programming lets you rewrite things into a beautiful pipeline for your data."
"When we're using this cache to estimate the radiance coming from a particular direction, it's just a simple lookup."
"BigQuery... has the capability to process billions of rows in seconds."
"Capability to process billions of rows in seconds."
"I'm going to use JSON loads, we're going to load every single line in there."
"Kinesis... data processing in a massive, massive volume."
"If you want to visually have a loading animation or something that informs the user that something's happening while that data is waiting to resolve, you can actually use the Nu loading indicator."
"Removing seams in data using the Build Virtual Raster tool."
"BigQuery allows massive processing of data at high speeds."
"Now the Delta location in the gold container is having two levels of transformation."
"We're moving our resource needs off-box to this device off the data plane that just exists somewhere."
"I'm just sending the output of each command to the next command after the pipe. It is not space sensitive. It is not newline space sensitive."
"Dataflow encapsulates your entire data processing task from start to finish."
"The distributed data set that your Beam pipeline operates on is called the P collection."
"When you design a pipeline, you can choose from linear pipeline, branching pipeline, or merging pipeline."
"Pub sub handles exactly once delivery, and data flow handles the deduplication, ordering, and windowing."
"Normalization keeps the math from breaking by taking and tweaking these values just a little bit."
"Now we've performed all these Transformations, it's recorded all the steps so I can use them again on any files that I add to that folder in the future."
"Python is much preferred over shell scripting for advanced data processing tasks."
"Piping takes the result of one command and sends it into another command."
"...drones offer a cost-effective way to gather and process data enabling you to complete projects more quickly and accurately than ever before."
"Enhanced privacy and security by keeping data and AI process local sensitive information can be processed on the edge devices without being transmitted to the cloud minimizing the risk of data baches and ensuring compliance with data privacy regulations."
"The thalamus, it's basically sorting data and sending it where it needs to go."
"I think we can all agree that is so much easier than using text functions or text to columns."
"Google Cloud Dataflow is the right technology for a mix of batch and stream processing."
"Databricks used Spark and the great thing about Spark for anyone who's used it is it is not language agnostic but it is more of a polyglot kind of processing engine."
"It's just like an extra distance control, so you can see values that are five and greater are now getting set to zero so that we don't get all of the ones that are far away from the curve."
"There are pipelines not just for NLP but also audio and image processing as well."
"Essentially, even if you had to do data text to column and assuming there's nothing chosen in step two and three, I can simply press finish."
"People look at Kinesis and they get really confused."
"Kinesis is really used for data processing at scale."
"Kinesis fits in the analytics category of AWS."
"Kinesis is very diverse and has a lot of potential in a variety of different use cases."
"Pipelines and we'll get into Data flows and we'll get into the basics of that but pipelines compared to a notebook if I just want to copy data pipelines are going to be a little bit more cost-effective generally speaking."
"We are actually working on the AI as well and right now our team is busy creating a model of Dr. Bean's all the videos transcripts."
"The model is compressed through this filter."
"Using SIMD instructions can significantly speed up processing large amounts of data."
"Delegateable operations enable efficient data processing."
"What we actually need is a change to how data is physically stored by the processing layer."
"Avoid 'first filter' function for large datasets; use 'lookup' function for better performance."
"...the way that we take in data and then I've always thought of it sort of as a pinball machine where it comes in and it hits something else which then bumps into something else and then it re-congeals into this sort of new and unique idea."
"DataFrames seamlessly integrate with Spark's ecosystem, including Spark SQL, MLlib, and GraphX, enabling users to leverage various libraries and functionalities."
"DataFrames can be easily converted to and from other data formats, such as Pandas DataFrames in Python, enabling seamless integration with other data processing tools."
"To sum it up, DataFrames provide a more user-friendly and optimized approach to work with structured data in Apache Spark."
"DataFrames provide Spark with schema information, enabling it to optimize the execution of queries and perform predicate pushdowns, leading to faster and more efficient processing."
"DataFrames offer a high-level, SQL-like interface, making it easier for developers to interact with data compared to the more complex RDD transformations and actions."
"Spark is particularly valuable for data engineers, data scientists, and big data professionals who deal with large-scale data processing."
"The Medallion architecture: bronze, silver, and gold layers representing stages in data processing and refinement."
"Spark SQL: supports interactive queries and is a significant part of Spark, leveraging structured data processing."
"So they're now computing the same thing but with 10x less overhead and you know 10x less cost, and so that's a pretty kind of powerful thing here that what delta's scalable metadata can really bring to Apache Spark."
"Delta Lake gives me the ability to kind of filter through massive amounts of data, shrink it down to something that will fit into pandas, and then I do some graphing with it."
"Interactive exploration with Apache Spark? Create a notebook!"
"I suspect when I do progressive loading, it's batch loading. So I'm checking everything for that specific batch, right?"
"The average human brain is running more than a couple of trillion bits of data every second."
"Maths works even while auto-update is running, so... when as in this case the phase shift between the two channels is changing, the green math curve changes as well."
"So, in this kind of little sample here, we have basically data coming in via FTP, we have a polling component over here, the inbound Channel adapter."
"Anything that gets streamed to there, push it into this fire hose for me to do some work on it as well."
"It's not just the data warehouse, so it has a data warehouse engine, an MPP data warehouse engine, right, where you add more machines. It's not just bigger and bigger and bigger machines. It's a cluster of machines."
"...the problem we're interested in is data parallel programming, parallel meaning that you are processing the same input but you have multiple workers that are working on that input..."
"The challenge in satellite remote sensing is no longer data availability but rather how to store and process all the information."
"128 epix10 this means that we're going to send 128 rows of data or flowers at a time to be processed."
"It helps you work, process, and filter tabular data so you can get your machine learning models out there a whole heap faster."
"From application point of view there won't be or the from the end users perspective they won't notice any difference because the geometries have gone from blob or binary to a spatial type of genre."
"Opening DICOM files in Python with PyDICOM is common in medical imaging."
"Now we're going to create a function that can take this file and turn it into a dictionary mapping the words to vectors."
"Data refinement along the data value chain unlocks value from insights. Raw data is continuously hydrated, meshed, and refined by our data integrations and data engineering team to create reliable pipelines in the lake house across various SLAs."
"It's a mash-up of the 'a' and 'b' vectors. We've got our eight bit control for a set of eight packed singles."
"So this is for a scenario where we want to pull in individual elements from all different places in memory and assemble them into a single vector."
"That removes the engineering bottleneck."
"This is just one flavor of 'Auto awesome,' and you can see here sometimes in the query plans you'll see these like shuffle repartition z', these are essentially us just sort of realizing that we have to add more shards."
"If you use approximate distinct it'll run like 60 times faster."
"Data Flow is recommended for existing Hadoop, Spark applications, or large batch jobs."
"...this can tell us how much data the load balancer itself is actually processing for the clients."
"So this is the underlying file format the Delta tables now on top of that we need a query engine that's capable of reading those files and giving us useful insights and results and this is where Apache spark kicks in."
"Structured streaming is the feature inside of Spark that allows us to work on data in near real-time as it arrives."
"We make not only easy to bring data easy to kind of process it look at it unify harmonize all that stuff but you can now act on it."
"For analysis purpose it's easiest if you think of things happening serially."
"Studying biological problems with graph representation learning is very much likely here to stay because there's a lot of data sitting and waiting to be processed."
"You're also going to be able to slice up these arrays."
"You're also going to be able to flip arrays."
"There's also subtract, multiply, and divide."
"You're also going to be able to save as a comma separated value file."
"The network is constructed forward from the data, it doesn't pre-empt signals, but just data provided for it."
"Changes from large collection can be processed in parallel by multiple consumers."
"There are three ways you can consume change feed."
"Delta Lake was designed from the ground up to allow one single system to support both batch and stream processing of data."
"The Delta Standalone project is an independent implementation that can read and write the Delta log files, providing underpinning support for various connectors."
"The need for large-scale real-time stream processing is becoming more evident every day."
"This is what Blender is reading, we now have a list from Python from the CSV reader containing the contents."
"Auto loader makes sure that your data is processed only once."
"Converting raw data into useful information for decision makers."
"Once you get past configuring these steps, it's jackpot time."
"In some cases, you want to fail fast meaning, you know, first broken record and you stop the execution."
"The focus with snowflake is you can land that semi-structured data in a data type and then you can use all snowflakes native functions to flatten that data for normalize it."
"This is the reason suffix trees were invented."
"We are this data refinery, accepting raw data from our customers and processing it."
"One of the benefits of query pipelines is the fact that you can run them in an async or parallel fashion."
"The brain is the control center that receives data, analyzes it, and sends signals back to your body, commanding it to respond."
"The graphics pipeline is a linear sequence of stages where each stage takes as input some data, performs some operations on it, and outputs the transformed data as input into the next stage."
"The reason that this random function is okay on a small data set and terrible on a large data set is the result set has to be produced in its entirety first."
"Once our acquisition is complete, we now have a raw Airyscan dataset that is ready for processing."
"We're going to combine all those features and we'll jump into splitting the data."
"When we are visualizing data, there are not just these two steps: processing, visualizing it, but there is the next step, which is really the experience, the sensing."
"Artificial intelligence is trying to change the world by changing how we analyze and process information."
"You need to know how it went from the raw sequence all the way to VCFs."
"To actually import transactional data into Excel, you need to click on the file and then click open and click browse to better understand where your files are located."
"Data wrangling is a process of cleaning, structuring, and enriching raw data into a desired format for better decision making."
"This is a pretty good example that allows you to touch on a lot of different technologies for the purpose of data processing."
"Coalesce cannot increase the number of partitions that you currently have; it can only decrease."
"This gives us fault tolerance and it gives us replayability, which is very important."
"Apache Spark is an open-source analytics engine for large-scale data processing."
"Your brain is processing data at all times."
"Machine learning is simply an approach to making a lot of small decisions."
"Learning how to process data, using a programming language gives you freedom."
"Example of pull or poll-based events is DynamoDB stream event, a Kinesis stream event or an Amazon SQS queue event."
"So that's the beauty of filtering it through this unique values the unique list function that we've just implemented."
"Stream processing is one of those things that can be harder and easier than you think."
"Delta Lake supports both batch and streaming, which is very important."
"Delta Lake also works for streaming; it can be used as a source and also as a target."
"We will extract the data, clean it, perform some transformation, and store it in the Spark managed table."
"We have covered the first step; we have extracted the data and the second step we have transformed the data."
"The role of business intelligence is to convert the data into information."
"We're using a divide and conquer approach here; we're going to have some large data set that we want to process quickly."
"I've never been to this stadium, so I took a shot at reimagining this same data and leveraging this concept of schemas to help me and my end users process the data more efficiently."
"The fundamental tenet of data orientation is to give primacy to the data, to the processes on the data."
"Now what you can do is create a full pipeline which just takes a PDF and outputs all the different tables in some separate CSV files."
"The most fun way to create a PCollection is by transforming an existing PCollection."
"You want to basically say, 'I don't have to choose between batch or streaming; I just want to be able to handle both at the same time.'"
"The navigation system is able to process so-called swarm data via its Risk Radar service, which anonymously captures data about traffic and road conditions from vehicles with relevant equipment."
"Data proc has native integration into cloud logging and Google stack driver."
"Any digital computer carries out five functions: it takes data as input, stores data and instructions in its memory, processes the data and converts it to information, generates output, and controls all the above steps."
"The idea with Arrow is to agree on a common efficient for data processing in memory representation."
"Data transformation provides the standardized way of transforming the data from the raw amount to a standardized form and then actually preparing them for analytics."
"We're going to use the diffusion process to destroy all the structure in the data and then we're going to learn the reversal of this diffusion process."
"I think it's amazing that all of that information from the input to the output was captured in that one little blip."
"UNIQUE looks at your range and returns all the unique values, discarding any duplicates."
"We can actually loop over this result set and just print out for every item I find in the result set, print out the item dot text."
"Explode function separates key-value pairs into two columns, making it easier to flatten out complex data."
"YAML Python libraries are also capable of serializing whole Python objects and not just raw data."
"Extract, transform, load - it means I'm going to extract data from some source... then you're going to transform that data... and finally you go and you load your data into your final destination."
"File notification mode is mostly suitable whenever we are processing millions of files under a huge number of directories."
"Massaging data means cleaning it up, doing some mutations that don't really change the meaning of the data but canonicalize it, standardize it."
"By adding more GPUs, your model is able to see more data on each training step."
"Dataflow is a managed service for executing a wide variety of data processing patterns."
"I'm going to show you how to read from CSV files, Excel files, even off of tables on HTML sites."
"The girls have been so well behaved during this whole ordeal."
"The Spark UI is the key to understanding what's going on in your Spark job."
"I see a 5 to 6X improvement from RDDs to DataFrames."
"At present, Spark exists as a next-generation real-time and batch processing framework."
"There's this entirely wonderful new set of techniques called compressive sensing that are just emerging."
"Power query's function-based, case-sensitive language also known as the M language."
"Kafka can also be easily used to build a modern and scalable ETL, change data capture, or big data ingest systems."
"Input is text, output is knowledge graph."